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SYSTEM AND PROCESS FOR DEIECTING OUTLIERS FOR INSURANCE 
UNDERWRrnNGSUTTABLEFORUSEBY AN AUTOMATED SYSTEM 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

•me .present invention relates to a system and process for underwriting insurance 
plications, and more particularly to a system and process for undervmting insurance 
q)pUcations based on the detection and identification of outiier appUcations. 

2. Description of Related Art 

Classification is tiie process of assigning an input pattern to one of a predefined set of 
classes. Classification problems exist in many real-world appUcations, such as 
medical diagnosis, machine feult diagnosis, handwriting character recognition, 
fingerprint recognition, and credit scoring, to name a few. Broadly speaking, 
classification problems can be categorized into two types: dichotomous classification, 
and polychotomous classification. Dichotomous classification deals witii two-class 
classification problems, while polychotomous classification deals with classification 
problems that have more than two classes. 

Classification consists of developing a functional relationship between tiie input 
features aid tfie^et classes.-Accurately estimating-such a-relationship is key to the 
success of a classifier. Insurance underwriting is one of tiiese classification problems. 
The miderwriting process consists of assigning a given insurance appUcation, 
described by its medical and demographic records, to one of tiie risk categories (also 
referred to as rate classes). A tramed individual or individuals traditionally perform 
insurance underwriting. A given appUcation for insurance (also referred to as an 
"insurance appUcation'O may be compared against a pluraUty of underwriting 
standards set by an insurance company. The insurance appUcation may be classified 
into one of a pluraUty of risk categories available for a type Of insurance coverage 
requested by an applicant Tlie risk categories tiien affect tiie premium paid by tiie 
appUcant, tiie higher tiie risk category, higher tiie premium. A decision to accept 



PCTAJS2004/0082S6 

WO 2004/099943 

or reject the appUcadori for insurance may also be part of to risk classification, as 
risks above a certain tolerance level set by the insurance company may sinq)ly be 
rejected. 

Insurance underwriting often involves the use of a large number of features in the 
decision-making process. The features typicaUy include the physical conditions, 
medical information, and femily history of the applicant Further, insurance 
underwriting frequently has large number of risk categories (rate classes). The risk 
category of an insurance appUcation is traditionally determined by using a number of 
xules/standards, which have the form of. for example, "if the value of feature x 
exceeds a. then the application can't be rate class C, U, the appUcation has to be 
lower than C". Such manual underwriting, however, is not only time-consuming, but 
also often inadequate in consistency and reliability. The inadequacy becomes more 
apparent as the complexity of insurance applications increases. 

Hiere can be a large amount of variabiUty in the insurance underwriting process when 
indiyii1ii«l underwriters perform it Typically, underwriting standards camiot cover a ll 
possible cases and variations of an ^plication for insurance. The underwriting 
standards may even be self-contradictory or ambiguous, leading to an uncertain 
application of Ibe standards. The subjective judgment of Ihe underwriter Avifl ahnost 
always play a role in the process. Variation in factors such as underwriter teaining 
~ IHd «^-^cerand-a-multitude-of-other--effects-can^^^ 

issue different, inconsistent decisions. Sometimes these decisions can be in 
disagreement with the estabUshed underwriting standards of the insurance company, 
while sometimes they can fell into a "gray area" not expUcitly covered by the 
underwritii^ standards. 

Further, there may be an occasion in i^ch an underwriter's dedaon could still be 
considered correct, even if it disagrees with the written underwriting standards. This 
situation can be caused when the underwriter uses his/her own experience to 
determine whether the miderwriting standards should be adjusted. Different 
underwriters may make different determmations about when these adjustments are 
allowed, as they might ^ly stricter or more Uberal interpretations of the 
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underwriting standards. Thus, . the judgment of experienced undervwriters may be in 
conflict with the desire to conastently ^ply the underwriting standards. 

Other drawbacks may also exist 
SUMMARY OF THE INVENTION 

By way of an txaaphay embodirhent of the invention, a process for preparing an 
insurance application for underwriting based on a pluraUty of previous insurance 
appUcation underwriting decisions is disclosed. Hie process inchides receiving a 
request to underwrite Ihe insurance appUcation, assigning a risk classification to the 
insmance q)pUcation, defining a set comprising at least one of the pluraUty of 
previous msurance appUcation underwriting decisions, comparing die insurance 
qq)Ucation to flie set, and designating the insurance appUcation based at least in part 
on the comparison between the insurance ^Ucation and the set and the risk 
classification asagned to the insurance appUcation, vfbsxc the designation is one of 
designating die insurance appUcation as an ouflier insurance ^Ucation and 
designating the insurance appUcation for underwriting. 

By way of another exemplary embodiment, a process for preparing an insurance 
appUcation for underwriting based on a pluraUty of previous insurance appUcation 
underwriting decisions includes receivmg a request to underwrite die insurance 
appUcation, wherein the insurance appUcation comprises at least one feature, 
assigning a risk classification to the insurance appUcation, and defining a set 
comprising at least one of the pluraUty of previous insurance appUcation underwriting 
decisions. The process fiirfher mcludes comparing the insurance ^pUcation to the 
set, where comparing comprises comparing at least one feature of the msurance 
appUcation to a corresponding feature in the at least one of the pluraUty of previous 
insurance appUcation underwriting decisions in die set, comparing the classification 
assignment of the msurance «q»pHcation to die classification assignment of die at least 
one of die pluraUty of previous insurance qipUcation underwriting decisions m die 
set, designating die insurance appUcation based at least in part on die comparison 
between die insurance appUcation and die set and die risk classification assigned to 
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the insurance appUcation, v^iere the designation is one of designating the insurance 
appUcation as an outlier insurance ^pUcation and designating the insurance 
{plication for underwriting. 

According to another example of an embodiment of the invention, a computer 
readable medium having code for causing a processor to prepare an insurance 
appUcation for underwriting based on a plurality of previous insurance appUcation 
underwriting decisions is described. The medium includes code for receiving a 
request to underwrite the insurance ^pUcation, code for assigning a risk classification 
to the instance appUcation, code for defining a set comprising at kast one of the 
pluiaUty of previous insurance appUcation underwriting decisions, code for 
comparing the insurance appUcation to the set, and code for designating the insurance 
appUcation based at least in part on the comparison between the insurance application 
and Ae set and the risk clasafication assigned to flie insurance ^Ucation. \^ere the 
designation is one of designating the insurance appUcation as an outUer insurance 
appUcation and designating the insurance appUcation for underwriting. 

According to a further exemplary embodiment, a computer readable medium having 
code for causing a processor to prepare an insurance appUcation for underwriting 
based on a pluraUty of previous insurance appUcation underwriting decisions includes 
code for receiving a request to underwrite tiie insurance appUcation, wherein tiie 
-iSsmance-appUcatiott-comprises-at4east_one_feature,_.C0de.Jq^^ 
classification to tiie insurance appUcation, code for defining a set comprising at least 
one of the pluraUty of previous insurance appUcation underwriting decisions, and 
code for comparing tiie insurance q)pUcation to tiie set, where tiie code for comparing 
comprises code for comparing at least one feature of tiie insurance ^Ucatipn to a 
corresponding featiire in tiie at least one of tiie pluraUty of previous insurance 
appUcation underwriting decisions in the set, code for comparing tiie classification 
assignment of tiie insurance appUcation to tiie classification assignment of tiie at least 
one of tiie pluraUty of previous insurance appUcation underwriting decisions in tiie 
set The exemplary embodiment ftnrther includes code for designating the insurance 
appUcation based at least in part on tiie comparison between tiie insurance appUcation 

and tiie set and tiie risk classification assigned to tiie insurance application, where tiie 
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designation is one of designating the insurance application as an outUa: insurance 
application and designating Hie insurance plication for underwriting. 

By way of a further exemplary embodiment, a system for preparing an insurance 
appUcation for underwriting based on a pluraUty of previous insurance ^Ucation 
underwriting decisions includes means for receiving a request to underwrite the 
insurance appUcation, means for assigning a risk classification to the insurance 
appUcation, means fot defining a set comprising at least one of the pluraUty of 
previous insurance appUcation underwriting decisions, means for comparing the 
insurance appUcation to the set, and means for deagnatmg Ae insurance appUcation 
based at least in part on the comparison between the insurance appUcation and the set 
and the risk classification assigned to the insurance appUcation, where the designation 
is one of designating the insurance appUcation as an outUer insurance appUcation, and 
designating the msurance appUcation for underwriting. 

According to anotiier embodiment of the present mvention, a system for preparing an 
insurance appUcation for underwriting based on a pluraUty of previous msurance 
appUcation underwriting decisions is described. The system includes means for 
receivmg a request to underwrite the msurance ^Ucation, wherein the insurance 
appUcation comprises at least one feature, means for assigning a risk classification to 
the insurance appUcation, and means for defimng a set comprising at least one of the 
pluraUty of previous insurance appUcation underwriting decisions. In addition, the 
system also includes means for comparing the msurance appUcation to the set, where 
tiie code for comparing comprises means for comparing at least one feature of the 
msurance appUcation to a corresponding featiire in the at least one of tiie pluraUty of 
previous msurance appUcation underwriting decisions in the set, means for comparing 
the classification assignment of the insurance appUcation to the classification 
assignment of the at least one of the plurality of previous insurance appUcation 
underwriting decisions in the set, and means for designating the msurance ^pUcation 
based at least in part on the comparison between the insurance ^plication and the set 
and the risk classification assigned to the msurance appUcation, where the designation 
is one of designating the insurance appUcation as an ouflier insurance ^pUcation, and 
designating the insurance appUcation for underwritii^. 
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By way of another exemplary embodiment of the present invention, a system for 
preparing an insurance appUcation for underwriting based on a pluraUty of previous 
insurance appUcation underwriting decisions includes a receiver for receiving a 
request to underwrite the insurance appUcation, a classifier module assigning a risk 
classification to tbe insurance appUcation, a definition module for defining a set 
comprising at least one of the phiraUty of previous insurance plication underwriting 
decisions, a comparison module comparing the insurance appUcation to the set, and a 
designation module for designating the insurance appUcation based at least in part on 
the comparison between the msurance appUcation and the set and the risk 
classification assigned to the insurance appUcation, where the designation is one of 
designating the insurance ^pUcation as an outUer insurance appUcation and 
designating the insurance appUcation for underwriting. 

Accordmg to a further exemplary embodiment of the mvention, a system for 
preparing an insurance appUcation for underwriting based on a pluraUty of previous 
insurance appUcation underwriting decisions includes a receiver for receivmg a 
request to underwrite tiie insurance appUcation, wherein the insurance appUcation 
comprises at least one featiire. a classifier module for assigning a risk classification to 
the insurance q>pUcation, a definition module for defining a set comprising at least 
one of the pluraUty of previous msurance appUcation underwriting decisions, a 
comparison module for comparing tiie msurance appUcation to the set, compssmg at 
least one feature of fte insurance appUcation to a correspondinTfeaufe in tiie at least 
one of the pluraUty of previous insurance application underwriting decisions in the 
set, comparing the classification assignment of the insurance appUcation to the 
classification assignment of the at least one of the pluraUty of previous insurance 
jpplication underwriting decisions in the set, and a designation module for 
designating the insurance appUcation based at least in part on the comparison bptween 
tiie insurance appUcation and tiie set and tiie risk classification assigned to tiie 
insurance appUcation, where tiie designation is one of designating tiie msurance 
^pUcation as an outUer insurance appUcation and designating tiie insurance 
appUcation for underwriting. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 illustrates the architecture of a quaUty assurance system based on the fusion 
of multiple classifiers according to an embodiment of the invention. 

Figure 2 illustrates a table of an outer product using the function T(x,y) according to 
an embodiment of &e inventions 

Figure 3 iUustcates the digoiiited rate classes within the universe of rate classes 
according to an embodiment of the invention- 
Figure 4 illustrates the results of the intersections of the rate classes and tiie universe 
according to an embodiment of the inventionu 

Figures 5-9 illustrate the results of T-norm operators according to an embodiment of 
Ihe inventioiL 

Figures 10-14 illustrate tiie normalized results of T-norm operators according to an 
"embodiment ofthe-invCTtioib ■ — 

Figure 15 illustrates a summary of the fusion of two classifiers according to an 
embodiment of the invention. 

Figure 16 illustrates a penalty matrix for a fiision module according to an embodiment 
of the invention. 

Figure 17 illustrates a summary of the fiision of two cla^ifiers witii disagreement 
according to an embodiment of the invention. 

Figure 18 illustrates a summary of tiie fiision of two classifiers witii agreement and 
discounting according to an embodiment of the invention. 

Figures 19-23 illustrate the results of T-norm operators according to an embodiment 
of the inventioiL 

Figures 24-28 illustrate tiie normalized results of T-riorm qperators according to an 
embodiment of the invraition. 

7 



wo 2004/099943 



PCT/US2UU4/UU82Sb 



Figure 29 illustrates a Dempster-Schaefer penalty matrix according to an embodiment 
of the invention. 

Figure 30 illustrates a comparison matrix according to an embodiment of the 
invention. 

Figure 31 illustrates fusion as a function of a confidence threshold for non-nicotine 
cases according to an embodiment of the invention. 

Figure 32 illustrates fusion as a function of a confidence threshold for nicotine cases 
according to an embodiment of the invention. 

Figure 33 illustrates a Venn diagram for fusion for non-nicotine cases according to an 
embodiment of the inventiorL 

Figure 34 iUustrates a Venn diagram for fusion for nicotine cases according to an 
embodiment of the invention. 

Figure 35 is a flowchart that illustrates an outlier detector accordmg to an 
embodimait of die inventioiL 

Figure 36 illustrates an outlier detector used in quality assurance according to an 
embodiment of the invention. 

Figure 37 illustcates a plot of two features for insurance applications according to an 
embodiment of the invention. 

Figure 38 is a flowchart that illustrates a tuning process according to an embodiment 
of the invention. 

Figure 39 is a flowchart that iUustrates a classification process according to an 
embodiment of the invention. 

Figure 40 illustrates a comparison matrix according to an embodiment of the 
invention. 
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Figure 41 iUustrates a comparison matrix for a modified process according to an 
anbodimoat of the invention* 

Figure 42 is a flowchart that iUustrates a multi-variate adaptive regression splines 
("MARS") process according to an embodiment of the invention. 

Figure 43 is a histogram that aiustiatfis decision boundaries according to an 
embodiment of the invention. 

Figure 44 iUustrates a paraUel network implementation according to an embodiment 
of the invention. 

Figure 45 Ulustrates a craiparison matrix according to an embodiment of the 
invention. 

Figure 46 iUustrates an annotated comparison matrix according to an embodiment of 
the invention. 

Figure 47 iUn^tnrfes a per formance of MA RS models using five partitions according 
to an embodiment of the invention. 

Figure 48 iUustrates minimum, maximum, and avraage performances of anetwork of 
MARS models according to an embodiment of the invention. 

" Fi^l9'iil^^fiiS^ipi^evwse^^ 
according to an embodiment of the invention. 

Figure 50 Ulustrates a multi-class neural network decomposed into multiple binary 
classifiers accordmg to an embodiment of flie invention. 

Figure 51 Ulustrates an architecture for a neural network clasafier according to an 
embodiment of the invention. 

Figure 52 Ulustrates a confusion matrix before post-processing according to an 
embodimait of the invention. 
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Figure 53 illustrates a confusion matrix after post-processing according to an 
anbodiment of the invention. 

Figure 54 illustrates performance before post-processing according to an embodiment 
of the inventioiL 

Figure 55 illustrates performance after post-processing acpording to an embodiment 
of the invention. 

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

A system and process for underwriting of insurance appUcations that is suitable for 
use by a computer rather than by human intervention is described. The system and 
process make use of existing risk assignments made by human underwriters to 
categorize new appUcations in terais of the rbk involved. One technical effect of tiie 
invention is to provide an automated process for consistent and accurate underwriting 
dedsions for insurance appUcations. Various aspects and components of this system 
and process are described below. 

It will be recognized, however, that the principles disclosed herein may extend 
beyond the leahn of insurance underwriting and that it may be ^pUed to any risk 
classification process, of which the determination of the proper premium to cover a 
given risk (j.e. insurance underwriting) is just an exan^le. Therefore the ultimate 
domain of this invention may be considered risk classification. 

1. Fusion Module 

An aspect of the invention provides a system and process for fusing a coUection of 
classifiers used for an automated insurance underwriting system and/or its quality 
assurance. While the design method is demonstrated foir quality assurance of 
automated insurance underwriting, it is broadly appUcable to diverse decision-making 
appUcations in business, commercial, and manufacturing processes. A process of 
fusing the ou^uts of a collection of classifiers is provided. The fusion can 
compensate for the potential correlation among the classifiers. The reliabiUty of €ach 
classifier can be represented by a static or dynamic discounting fector, 'wdiich wiU 
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reflect the expected accuracy of the classifier. A static discounting fector rei»esents a 
prior expectation about the classifier's reliabiUty, eg., it might be based on the 
average past accuracy of the model. A dynamic discounting represents a conditional 
assessment of the classifier's reUability, e.g., whenever a classifier bases its output on 
an insufficient number of points, the result is not reUable. Hence, this fector could be 
determined fiom the post-processing, stage in each model. The fusion of the data will 
typically result in some amount of consensus and some amount of conflict among flie 
classifiers. The consensus will be measured and used to estimate a degree of 
confidence in the fiised decisions. 

According to an embodiment of the invention, a fiision module (also referred to as a 
fijsion engine) combines the outputs of several dedsion engines (also referred to as 
classifiers or components of the fiision module) to determine the correct rate class for 
an insurance ^pUcation. Using a fiision module with several decision engines may 
enable a classification to be assigned with a higher degree of confidence than is 
possible using any single model. According to an embodiment of the invention, a 

-fusion module-fimction-may-be-part.of.a quahty assurance C'QA") process to test and 
monitor a production decision engine ("PDE") that makes the rate dass assignment in 
real-time. At periodic intervals, e.g., every week, the fiision module and its 
components may review the decisions made by the PDE during the previous week. 
Hie ou^ut of this review will be an assessment of the PDE performance over that 

~^k, as weUlilfieidratificafiMof cases with-diflferentievelof decision-quaUty-. — 

The fiision module may pennit the identification of the best cases of appUcation 
classification, e.g., those with high-confidence, high-consensus decisions. These best 
cases in tum may be likely candidates to be added to the set of test cases used to tune 
the PDE. Further, the fiision module may permit the identification of the worst cases 
of appHcation classification, e.g., those with low-confidence, low-consensus 
decisions. These worst cases may be likely candidates to be selected for a review by 
an auditing staff and/or by senior underwriters. 

A fiision module may also permit the identification of unusual cases of appUcation 
classification, e.g., those with unknown confidence in their decisions, for which the 
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models in the fusion module could not make any strong commitment or avoided tiie 
decision by routing the insurance appUcafion to a hmnan underwriter. These cases 
may be candidates for a blind review by senior underwriters. In addition, a fusion 
module may also permit an assessment of the performance of the PDE, by monitoring 
the PDE accuracy and variabiUty over time, such as monitoring the statistics of low, 
borderline and high quality cases as well as the occurrence of imusual cases. These 

statistics can be used as indicators for risk management. 

According to an embodiment of the invention, a fusion module may leverage the fact 
that except for the unusual situation where all the components (eg., models) contain 
the same information {e.g., an extreme case of positive correlation), each componait 
should provide additional information. This information may either corroborate or 
refute the output of the other modules, thereby supporting either a measure of 
consensus, or a measure of conflict These measures may define a confidence in the 
result of the fusion. In general, the fiision of the components' decisions may provide 
a more accurate assessment than the decision of each individual component 

The fusion module is described in relation to various^^^f d^on engines, 
including a case-based decision engine, a dominance-based decision engine, a multi- 
variate adaptive regresaon splines engine, and a neural network decision engine 
respectively. However, the fusion module may use any type of decision engine. 
- Accordirg tn «n embodiment of the mvention, the fu sio n module w iU support a 
quahty assurance process for a production decision engine. However, it is understood 
that the fusion module could be used for a quaUty assurance process for any other 
decision making process, mcluding a human underwriter. 

According to an embodiment of the invention, a general method for the fusion 
process, which can be nsed with classifiers that may exhibit any kind of (positive, 
neutral, or negative) correlation with each other, may be based on the concept of 
triangdar norms ("T-norm"), a multi-valued logic generalization of the Boolean 
intersection operator. The fusion of multiple decisions, produced by multiple sources, 
regardmg objects (e.g., classes) defined in a common framework {e.g., the universe of 
discourse) consists of determinmg the underlying of degree of consensus for each 
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olgect Class) under consideration, /.e., the intersections of their decisions. With 
the intersections of multiple deciaons, posable correlation among the sources needs 
to be taken into account to avoid under-estimates or over-estimates. This is done by 

the proper selection of a T-norm operator. 

According to an embodiment of the invention, each model is assmned to be solving 
the same classification problem. Therefore, the output of each clasafier is a weight 
assigmnent that represents the degree to which a given class is selected. The set of all 
possible classes, referred to as U, represents the common universe of aU answers tiiat 
can be considered by the classifiers. Hie assigmnent of weights to this universe 
represents the classifier's ignorance (/.e., lack of commitment to a specific decision). 
This is a discounting mechanism that can be used to represent the classifier's 
reliabiHty. 

According to an embodiment of the invention, the outputs of the classifiers may be 
combined by selecting the generalized intersection operator (e.g.. the T-norm) that 
better represents tiie possible correlation between the classifiers. With this operator, 
the assigmnents of the classifiers are interaected and a derived measure of consensus 
. is computed. This fiision may be performed in an associative manner. the output 
of the fiision of the first two classifiers is comWned with the oulput of the third 
classifier, and so on. until all available classifiers have been considered. At this stage, 
the final output may be normalized (e.g., showing the degree of sdection as a 
percentage). Further; the strongest selection of tiie fiision may be identified and 
qualified with its associated degree of confidence. 

Tlius, according to an embodiment of the invention, a fiision module only considers 
weight assignments made either to disjoint subsets that contain a singleton (e.g., a rate 
class) or to the entire universe of classes U , the entire set of rate classw), as will 
be described in greater detail below. Once compensation has been made for 
correlation and fiision has been performed, the degree of confidence C is computed 
among the classifiers and used to qualify the decision obtained from the fiision. 
Further, the confidence measure and the agreement or disagreement of tiie fiision 
module's decision is used with the production engine's decision to assess the quality 
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of the production engine. As a by-product, flie appUcation cases may be labeled in 
terms of Hie dedsion confidence. Thus, cases with low, high, or unknown confidence 
may be used in different ways to maintain and update the production engine. 

Other types of aggregation could be used, but would need to be associative, 
compensate for correlation, accommodate the discounting of classifies, and generate 
a confidence measure of the combined decision, properties that are not directly 
satisfied. A particular case may be a Dempster-Shafer fDS") fusion rule. The DS 
fusion rule requires the classifiers to be evidentially independent, te.. the errors of 
one classifier must be uncorrelated with those of anoHwa: one. Furthermore, the DS 
paradigm does not allow us to represent the ordering among the classes, typcal of the 
insurance underwriting process. Tbis ordering implies that there could be mmor 
differences (such as the selection of two adjacent classes) and major differences (such 
as the selection of different classes at the extreme of their range). Therefore, the 
conflict between two sources is a gradual one, rather than a binary one (hit/miss). 
FinaUy, in DS theory, the classifiers' ou^uts are considered probabiUty assignments. 

Triangular norms (T-non^) and ^angular conorms CT-conorms) are t^nmost 
general femiHes of binary functions that satisfy the requkements of the conjunction 
and digunction operators, respectively. T-norms T(x,y) and T-conorms S(x.y) ate 
two-place functions that map the unit square into the unit interval, ie., T(x,y): 
-[0,l-M0.1]-">-[0'll-^^ ^('^'y^ [0,l]x[0,l] ^ [0,1]. They are monotonic, 
commutative and associative functions. Their corresponding boundary conditions, 
i.e., the evaluation of the T-norms and T-conorms at the extremes of the [0,1] interval, 
satisfy the trufli tables of the logical AND and OR operators. They are related by the 
DeMorgan duaKly, which states that if N(x) is a negation operator, Hien the T^norm 
S(x,y,) can be defined as S(x,y) = N(T(N(x), N(y))). 

As described m Bonissone and Decker (1986) the contents of which are incorporated 
by reference in their entirety, six parameterized famifies of T-norms and thdr dual T- 
conorms may be used. Of the six parameterized famiUes, one family was selected due 
to its complete coverage of the T-norm space and its numerical stabiUty. This family 
has a parameter p. By selecting different values of p, T-norms with different 
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properties can be instantiated, and thus may be used in the fusion of possibly 
correlated classifiers. 

Various articles discuss the fiision and the different features associated therewith, 
include proofe as to the development of algorithms associated with the present 
invention. Chibelushi et dL (Chibelushi, C. C, Deravi, F., and Mason, J. S. D., 
"Adaptive Classifier Integration for Robust Pattern Recognition," IEEE Transactions 
on Systems, Man, and Cybernetics, vol. 29, no. 6, 1999, the contents of v^ch are 
incorporated herein by reference) describe a linear combination method for combining 
the outputs of multiple classifiers used in speaker identification appUcations. 

Fairhurst and Rahman (Fairhurst, M. C, and Rahman, A. F. R., "Enhancing 
consensus in multi expert decision fusion," 7E£ ProcVis. Image Signal Process, vol. 
147, no. 1, 2000, the contents of which are incorporated herein by reference) describe 
ENCORE, a multi-classifier fijsion system for enhancing the performance of 
individual classifiers for pattern recognition tasks, specificaUy, the task of hand 
written digit recognition. Kuncheva and Jain (Kuncheva, L. I., and Jain, L. C, 
"Designing Classifier Fusion Systems by Genetic Algorithms," IEEE Transactions on 
Evolutionary ConynUation, vol. 4, no. 4, 2000, the contents of which are incorporated 
herein by reference) describe a genetic algwithm ^oach to the design of fiision of 
multiple classifiers. 

Xu et al. (Xu, L., Krzyzak. A., and Suen, C. Y., "Methods of Combining Multiple 
Classifiers and Their AppUcations to Handwriting Recognition, " IEEE Transactions 
on Systems, Man, and Cybernetics, vol. 22, no. 3, 1992, tiie contents of ^ch are 
incorporated herein by reference) describe several standard approadies for classifier 
decision fusion, including the Dempster-Shafer approach, and demonstrate fiision fijr 
handwritten character recognition. 

Arthur Dempster (A. P. Dempster, "Upper and lower probabiUtics induced by a 
multivalued mapping," Annals of Mathematical Statistics, 38:325-339, 1967, tiie 
contents of which are incorporated herein by reference) describes a calculus based on 
lower and upper probabiHty bounds. Dempster's rule of combination describes the 
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pooling of sources under the assumption of evidential independence. Glenn Shafer 
(G. Shafer, "^4 Mathematical Theory of Evidence", Princeton University Press, 
Princeton, New Jersey, 1976, the contents of which iare incorporated herein by 
reference) describes the same calculus discovered by Dempster, but starting fiom a set 
of super-additive beUef functions that are essentially lower bounds. Shafer derives 
the same rule of combmation as Denq)ster. Enrique Ruspmi (E. Ruspini, "Epistemic 
logic, probabiUty, and the calculus of evidence. Proc Tenth Intern. Joint Conf. on 
Artificial Intelligence, Milan, Italy, 1987, tbe contents of v»*ich are incorporated 
herein by reference) goes on to describe a possible-world semantics for Dempster- 
Shafer theory. 

B. Schweizer and A. Sklar (B. Schweizer and A. Sldar, "Assodative Functions and 
Abstract Semi-Groups", Publicationes Mathematicae Debrecen, 10:69-81, 1963, the 
contents' of which are mcorporated herein by reference) describe a parametric family 
of triangular T-norm functions that generalize the concept of mtersection m multiple- 
valued logics. Piero Bomssone and Keith. Decker (P. P. Bonissone and K. Decker, 
"Selecting Uncertainty- CalcuU .andJGranularityj, . An_Bcperima^_^ Tradmg-ofF 
Precision and Complexity" in Kanal and Lemmer (editors; Uncertainty in Artificial 
Intelligence, pages 217-247, North-Holland, 1986, the contents of vMcU are 
mcorporated herem by reference) describe an experiment based on Schweizer and 
Sklar's parameterized T-norms. They show how five triangular norms can be used to 
represent an infinite~numbarof t^orm for some practical values of mformation 
granularity. Piero Bonissone (P. P. Bonissone, "Summarizing and Propagating 
Uncertam Information with Triangular Norms", International Journal of Approximate 
Reasoning, 1(1):71-101, January 1987, the contents of which are mcorporated herem 
by reference) also describes the use of Triangular norms in dealing with uncertainty in 
expert system. Specifically he shows the use Triangular norms to aggregate the 
uncertainty in the left-hand side of production rules and to propagate it through the 
firing and chaining of production rules. 

Fig. 1 iUustrates the architecture of a quality assurance system based on the fiision of 

multiple classifiers according to an embodunent of the invention. These classifiers 

mav include case-based reasoning model (described in U.S. Patent Application Serial 
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Nos. 10/170,471 and 10/171,190, the contents of wbich are incorporated herein by 
reference), a multivariate adaptive regression spUnes model {hereinafter also referred 
to as "MARS"), a neural network model and a dominance-based model. The MARS, 
neural networks, and dominance-based models are all described in greater detail 
below. 

System 100, as illustrated in Fig. 1, mcludes a number of quaUty assurance decision 
engines 110. In the embodiment illustrated in Fig. 1, the quaUty assurance decision 
engines 110 comprise a case-based reasoning decision engine 112, a MARS decision 
engine 114, a neural network decision engine 116, and a dominance-based decision 
engine 118. It is understood, however, that other types of quality assurance decision 
engines 110 could be used in addition to and/or as substitutes for those listed in the 
embodiment of the invention illustrated in Fig. 1 . 

Post processing modules 122, 124, 126, and 128 receive the outputs ftom the various 
quality assurance decision engmes 120 and perform processmg on the outputs. The 
results of the post-processing are input into a multi-classifier fusion module 130. The 
multi-classifier fSi^:modme-130-tfieB^utputs-a-fusion-rate-class decision 135 and a 
fiision confidence measure 140, which are input into comparison module 150. 

A fuzzy logic rule-based production engine 145 outputs a production rate class 
decision 147 and a production confidence measure 149, which are then input into 
<iOTlpaHsralnedul6-150:--After a-c^^ between-the-production 
rate class decision 147 and the fijsion rate class decision 135, and the production 
confidence measure 149 and the fiision confidence measure 140, a compared rate 
class decision 151 and a compared confidence measure 153 are output by comparison 
module 150. An evaluation module 155 evaluates the case confidence and consensus 
regarding flie compared rate class 151 and the compared confidence measure 153. 
Those cases evaluated as "worst cases" are stored m case database 160, and may be 
candidates for auditing. Those cases evaluated as ^hmusual cases" are stored in case 
database 165, and may be candidates for standard underwriting. Those cases 
evaluated as "best cases" are stored in case database 170, and may be candidates for 
using with the test sets. The outlier detector and filter 180 may ensure that any new 
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addition to the best-case database 170 will be consistent (in the dominance sense 
described below) with the existing cases, preventing logical outUers ftom being used. 
System 100 of Fig. 1 will now be described in greater detail below. 

According to an embodiment of the invention, the fosion process as disclosed in Fig. 
1 includes four general steps. These steps are: (1) coUection, discomiting and post- 
processing of modules' outputs; (2) determination of a combmed decision via the 
associative fusion of the modules' outputs; (3) determination of degree of confidence; 
and (4) identification of cases that are candidates for test set, auditing, or standard 
reference decision process, via the coiiq)arison module 150. Hiese steps wiU now be 
described in greater detail below. 

Each quaUty assurance decision module 1 10 generates an output vector / 1(2). 
... I(N*i)J where m e[O.M], vdiere Mis a large real value and i\r is the number of rate 
classes. In the embodiment of the invention illustrated in Fig. 1, each vector / is 
identified by a superscript associated with the quality assurance decision module 120 
that generates the vector. Therefore. is generated by case-based reasoning decision 
engine 112, 1'^is generated by MARS decision engine 114, 1^ neural network decision 
engine 116, and 1° is generated by dominance-based decision engine 118. Further, 

each entry for i=i i^, can be considered as the (un-normalized) degree to 

which the case could be classified in rate class i The last element, I(n*i) indicates the 
degree to which the case cannot be decided and the entire universe of rate classes is 
selected. 

For illustration purpose, assume lhat five rate classes are used, /.e., N=5, namely: 

Rate Class ={PreferredBest, Preferred. Select. Standar^lm. Standard. No Decision 
(Send to UW)} 

By way of this example, assume that the output of the first classifier (CBE) is: = 
[0.3, 5.4, 0.3, 0, 0, 0]. This indicates that the second rate class (eg.. Preferred) is 
strongly supported by the classifier. Normalizing I^ to see the support as a percentage 
of the overall weights, 1^ =[0.05,0.9.0.05,0,0,0], shows that 90% of the weights is 
assigned to the second rate class. 
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Further, to represent partial ignorance, cases in vdrich the classifier does not have 
enough information to make a more specific rate classification, discounting may be 
used. According to an embodiment of the invention, discounting may involve the 
assignment of some weight to the last element, conesponding to the universe U ^No 
Decision: Send to UW)- For example, the previous assignment of could be changed 
such fliat = [0.3, 1.4, 0.3, 0, 0, 4], and its normalized assignment would be 
/C^[0.05,0.23,0.05.0,0.0.67]. This example shows how 67% of the weights have 
now been assigned to the universe of discourse U (the entire set of rate classes). This 
feature allows a representation of the lack of commitment by individual modules. 
According to an embodiment of the invention, if it is necessary to discount a source 
because it is not beUeved to be credible, competent, or reliable enough in generating 
the correct decision, a portion of the weight is transferred to the universe of discourse 
(e.g., "any of the above categories"). TTie determination of the discount may be 
deriwd from meta-knowledge, as opposed to object-knowledge. Object knowledge is 
the level at which each classifier is functioning, e.g., mapping input vectors into 
decision bins. Meta-knowledge is reasoning about the classifiers' performance over 
tuiie. Dis^^g covild be static or dynamic. Static discounting may be usedapriori 
to reflect historical (accuracy) performance of each classifier. Dyiiamic discounting 
may be determined by evaluating a set of rules, viboso Left Hand Side C^S") 
defines a situation, characterized by a conjunct of conditions, and whose Right Hand 
-Side-(^KHS'-g-defines.the amount by which to discount whichever output is generated 
by the classifier. According to an embodiment of the invention, postprocessing may 
be used to detect lack of confidence in a source. When this happens, all the weights 
may be allocated to the universe of discourse, i.e., refiain from making any decision. 

According to an embodiment of the invention, each decision engine model will 
independently perform a post-processing step. For purposes of iUustration, the post 
processing used for the neural network model will be described. According to an 
embodiment of the invention, to fiirther improve the classification performance of a 
neural network module, some post-processing techniques may be appUed to the 
outputs of the individual networks, prior to the fiision process. For example, if the 
distribution of flie outputs did not meet certain pre-defined criteria, no decision needs 
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to be made by Ihe classifier. Rather, the case will be completely discounted by 
allocating all of the weights to the entire universe of discourse U. The rationale for 
this particular example is that if a correct decision cannot be made, it would be better 
not to make any decision rather than making a wrong decision. Considering the 
ou^uts as discrete membership grades for aU rate classes, the four features that 
characterize the membership grades may be defined as follows, where N is the 
number of rate classes and / the membership function, the oulput of the 
classifier. 

1. Cardinality 

c=i;/(i) 

1 

2. Entropy 

^=ili^(Oxlog(r(/)), where £^ =-logO^^) 

3. Difference between the highest and the second highest values of outputs. 

4. Separation between the rank orders of the highest and the second highest 
values of outputs 

, ? = RankOrder{Ir^i)-RankOrde rjl^z) 

With the features defined for characterizing the network outputs, the following two- 
step criteria may be used to identify the cases with weak decisions: 

Steipl: C<ri OR C>r2 OR E>t3 
Step2: I><r4 AND iS^l 
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where Tj.rj.rj, and are the thresholds. The value of the thresholds is typicaUy 
dataset dependent. However, m some emboduneirts, the value of the thre^olds may 
be independent of the dataset In the present example related to a neural network 
classifier module (which in turn is described in greater detail below), the value of the 
thresholds may be first empirically estimated and then fine-tuned by a global 
optunizer, such as an evolutionary algorithm. As part of this example, the final 
numbers are shown below in Table 1. Other optimization methods may also be used 
to obtaui the thresholds. 



Thresholds 


Non- 
nicotine 
Useis 


Nicotine 

Users 


-J 


0.50 


0.30 


f 


2.00 


1.75 


_3 . 


0.92 


0.84 


A 


0.10 


0.21 



Table 1 

Thus, post-processing may be used to identify those cases for \«iiich the module's 
output is likely to be unreliable. According to an embodhnent of the invention, rather 
than rejecting such cases, the model assignment of noraialized weights to rate classes 
may be discounted by assignmg some or aU of tiiose weights to the universe of 
discourse Ui! 

As described previously, the fusion module 150 may perform the step of determining 
a combined decision via tiie associative fiision of the decision engme models' outputs. 
According to an embodiment of tiie invention, any general method tiwt can be used to 
fuse the output of several classifiers may be used. The fusion metiiod may also be 
associative, meaning tiiat given tiiree or more classifiers, any two of tiie classifiers 
may be fused, tiien fusing tiie results witii tiie tiurd classifier, and so on, regardless of 
the orda*. 

By way of example of determinmg a combined decision, define m classifiers Si, ... 
Sm, such tiiat tiie output of classifier Sy is tiie vector showing tiie normalized 
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til 

decision of such classifier to the N rate classes. Recall the last (N+l) element 
represents the classifier's lack of commitment, i.e., 1^(2), I^(N+1)J, 

where: 

N+l 

7^(0 €[04] and (0 = 1 

M 

The un-normalized fusion of the outputs of two classifiers Si and S2 is further defined 
as: 

Fil\P) = Outerprodmt{l\Pj')^A 

where the outer-product is a well-defined mathematical operation, which in this case 
takes as arguments the two N-dimensional vectors /'and f and generates as output the 
NtsN dimensional array A. Each element ACiJ) is the result of applying the operator T 
to tiie corresponding vector elements, namely fQ) and /Q), e.g. , 

A(y) = T|l*(i),l'(j)] 

and as iUustrated in Fig. 2. Matrix 200 illustrates classes 202 and values 204 for 
vector I* and classes 206 and values 208 for vector I^. Intersection 210 illustrates one 
intersection between the vector iV and vector Other intersections and 
representations may also be used. 

The operator T(x,y) may be referred to as a Triangular Norm. Triangular Norms (also 
lefened to as T-notms") are general femilies of binary functions that satisfy the 
requirements of the intersection operators. T-norms are fimctions that m^ the unit 
square into the unit interval, i.e., T: [0,l]x[0,l] ^ [0,1]. T-norms are monotonic, 
commutative and associative, Tlieir corresponding boundary conditions, the 
evaluation of the T-norms at the extremes of the [0,1] interval, satisfy the truth tables 
of the logical AND operator. 
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As there appear to be an infinite number of T-nonns, the five most represeatative T- 
nonns for some practical values of information granularity may be selected. 
According to an ^bodiment of the invention, the five T-norms selected are: 

T-Norm Correlation Type 

T2(x,y)=x*y 

T^sO^yXx-'+y-'-ir 
T3(x,y)=rmD.ix,y) 

Extreme case of positive correlation 

The selection of the best T-nonn to be used as an intersection operation in the fusion 
of the classifiers may depend on the potential correlation among the classifiers to be 
fused. For example, T3 (the Tninimnnn operator) may be used when one classifier 
subsumes the other one "(e.g., extreme case of positive correlation). T2 may be 
selected vfbea the classifiers are uncorrelated (e.g., sunilar to the evidential 
independence in Dempster-Shafer). Tl may be used if the classifiers are mutually 
exclusive (eg., extreme case of negative correlation). The operators Tu and Tismay 
be selected when the classifiers show intermediate stages of n^ative or poative 
correlation, respectively. Of course, it will be understood by one of ordinary skill m 
die art tiiat other T-norms may also be used. However, for the purposes of the present 
invention, these five T-norms provide a good representation of the infinite number of 
fimctions that satisfy the T-norm properties. 

Because the T-norms are associative, so is the fusion operator, i.e., 
F(I\Fil\l')) = FiFil\P)j') 

Each element A(ij) represents the fused assignment of the two classifiers to the 
intersection of rate classes n and rj. Fig. 3 illustrates that each rate class is disjointed 
and that U 300, is the universe of all (rate) classes. In this example, rate classes ri 

23 



Extreme case of negative correlation 
Partial case of negative correlation 
No correlation 

Partial case of positive correlation 



wo 2004/099943 



PCT/US2UU4/UU8ZS6 



302, Ti 304 to Tn 306 are shown. Given that the rate classes are disjoint, there are five 
possible situations: 

(a) When and i< (N+J) then nnrj= rjnri= n 

(jo)Wieai=jandi=(N+l)ihiairinrj= C/(1he universe of rate classes) 

(c) When /Vf' and i< (N+1) and/ < (N+1) then n n rj^ ^ (the empty set) 

(d) When it^' and f = (N+1) then Un r/= rj 

(e) When iVjf andj'=(N+l) then rtnU^rt 

Fig. 4 depicts a chart 400 that illustrates the residt of the intersections of the rate 
classes and the universe U, according to an embodiment of ihe invention. The chart 
demonstiates the intersection according to those situations set forth above, such that 
when situation (a) occurs, the results are tabulated in the main diagonal identified as 
-410-in-Fig.-4— Further,-when-situation-(b)-OGGurs,-the-results-are-tabulated_m the 
appropriate areas identified as 420 in Fig. 4. When situation (c) occurs, the results are 
tabulated in the appropriate areas identified as 430, while when situations (d) or (e) 
occur, the results are tabulated in the ^ropriate areas identified as 440 in Fig. 4. By 
way of example, when one application is rated rl in the first instance and r2 m the 
second mstance, the intersection may be tabulated at 450, where ihe colunm for rl and 
the row for r2 intersect. In this example, the intersection of rl and r2 is the empty set 
The decisions for each rate class can be gathered by adding up all the weights 
assigned to Ihem. According to the four possible situations described above, weights 
may be assigned to a q)ecific rate class only in situation a) and d), as illustrated in 
Fig. 4. Thus, there will be: 

Weight (rif = A(i.i)+ A(m+i)+ A(N+i.i) 
Weight (U) = A(N+J,N+J) 
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To iUustrate the fusion operator based on T-norms, an example will now be described. 
Assume that 

i '=[0.8. 0.15, 0.05. 0, 0. OJ and f =[0.9. 0.05. 0.05. 0. 0. OJ 

This indicates that both classifiers are showing a strong preference for the first rate 
class (e.g., "Preferred Best") as they have assigned them 0.8 and 0.9, respectively. 
Fusing these classifiers using each of the five T-norm operators defined above will 
generate the corresponding matrices A that are shown in the tables in Figs. 5-9, such 
that Fig. 5 illustrates an extreme positive correlation. Fig. 6 illustrates a partial 
positive coirelation. Fig. 7 illustrates no correlation. Fig. 8 ilhistrates a partial 
negative correlation and Fig. 9 illustrates an extreme negative correlation. If the 
results are normalized so that the sum of the entries is equal to one, the matrices A 
are generated, as shown in Ae tables in Figs. 10-14 m a manner corresponding to the 
un-normalized results. During the process, the un-normalized matrices A (Figs. 5-9) 
may be used to preserve the associative property. At the end, the normalized matrices 
A are used (Figs. 10-14). Using the expressions for weights of a rate class, the final 
weights for the N rate classes and the universe U from Figs. 10-14 can be computed. 
An illustration of the computation of the final weights is illustrated in the chart of Fig. 
15. Chart 1500 illustrates the five classes 1510, the five T-norms 1520, and tiie fused 
intosection results 1530. 

According to an embodiment of the invention, the confidence in the fusion may be 
calculated by defining a measure of the scattering around the main diagonal. The 
more the weights are assigned to elements outside the main diagonal, the less is the 
measure of the consensus among the classifiers. This concept m^ be represented by 
defining a penalty matrix P =[P(iJ)]> of the form: 

_jmax(0,(l-W*\i-J\)y' fori and l^j^J\r 
^('»f>-\l fori = (i\r+l)orj = (Ar+l) 



This fimction rewards the presence of weights on the main diagonal, mdicating 
agreement between the two classifiers, and penalizes the presence of elements off the 
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main diagonal, indicating conflict The conflict increases in magnitude as the distance 
fiom the main diagonal increases. For exanq>le, for W=02 and d=5 we have the 
penalty matrix set forth in Fig. 16. Matrix 1600 intersects the column classes 1610 
with tiie row classes 1620 to determine the appropriate penalty. 

Other functions penalimg elements off Hoe main diagonal, such as any suitable n<m- 
linear function of the distance from the main diagonal, i.e., the absolute value |i-jl, 
could also be used. The penalty function is used because tiie conflict may be gradual, 
as the (rate) classes have an ordering. Therefore, the penalty function captures the 
feet that the discrepancy between rate classes rj and is smaller than then the 
discrepancy between n and rj . The diape of the penalty matrix P in Fig. 16 captures 
this concept, as PI 600 shows that the confidence decreases non-linearly with tihe 
distance from the main diagonal. A measure of the normalized confidence C is the 

A 

sum of element-wise products between A and P 1 600, e.g. ; 

C=NoimalizedConfidence(A,P)=XS^<'»^*>*^(''-'*> 

where A is tiie normalized fusion matrix. The results of the fusion of classifiers SI 
and S2, usmg each of the five T-notms with the associated normalized confidence 
measure, are shown in Fig. 15. 

~Iorgituation-in-wMch-there-is-a-discrepancy-between-the-two classifiers,-^ 
be captured by Hie confidence measure. For instance, consider a situation different 
fiom the assignment illustrated in Figs. 5-14, in which the classifiers agreed to select 
the first rate class. Now eg., assume that the two classifiers are showing strong 
preferences for different rate classes, the first classifier is selecting the second rate 
class, vMe ihe second classifier is favoring the first class; 

f =[0.15, 0.85, 0.05, 0, 0. 0] and. f =[0.9, 0.05, 0.05. 0, 0, 0] 

The results of their fusion are summarized in the table of Fig. 17, where the chart 
1700 illustrates the rate classes 1710, the T-norms 1720 and the fused intersection 
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results 1730. None of the rate classes have a high weight and the normalized 
confidence 1^ dropped. 

According to an embodiment of the invention, it may be desirable to be able to 
discount the one of the classifiers, to reflect our lack of confidence in its reUabUity. 
For example, the second classifier (S2) in the first example (in which the classifiers 
seemed to agiee on selecting the first rate class) may be discomited: 

i '[0.8, 0.15, 0.05. 0, 0,0J and f =[0.9, 0.05, 0.05, 0, 0, 0] 

This discomitmg is rqpresented by allocating some of the classifier's weight, in this 
instance 0.3, to the universe of discourse U, (U=No decision: SeutJo^UW): 

f =[0.8.0.15,0.05,0,0,0] and f =[0.6, 0.05. 0.05, 0, 0. 0.3] 

The results of the fiision of and are summarized in Fig. 18 below. Summarization 
chart 1800 illustrates the classes 1810, T-norms 1820, the fused intersection results 
1830 and the confidence measure 1840. The rate classes have a slightly lower weight 
(for T3, T2.5, T2), .but the_normalized confidence is higher tiian with respect to Fig. 
15, as there is less conflict Fusion matrices A are shown in the tables of Figs. 19-23, 
while the tables of Figs. 24-28 illustrate matrices A. According to an embodiment of 
the invention, a fusion rule based on Dempster-Shafer corresponds to the selection of: 

a) T-norm operator r(i,;i^ =x'^; and 

b) Penalty fimction using fF=7 (or alternatively = 00) 

Constraint b) impUes the penalty matrix P 2900 illustrated in Fig. 29. Therefore, the 
two additional constraints a) and b) required by Dempster-Shafer theory (also referred 
to as "DS") imply that the classifiers to be fused must be uncorrelated (e.g., 
evidentially independent) and that there is no ordering over the classes, and any kind 
of disagreement (e.g:, weights assigned to elements ofif the main diagonal) can only 
contribute to a measure of conflict and not, at least to a partial degree, to a measure of 
confidence. In DS, the measure of conflict K is the sum of weights assigned to the 



27 



wo 2004/099943 



empty set This corresponds to the elements with a 0 in the penalty matrix P 2900 
illustrated in Figure 29. 

According to an embodiment of the invention, the normalized confidence C described 
above may be used as a measure of confidence, Le.: 

C = Normalized Confidence(A,P) = 2;Z^(''-/)*^<'''-') 

MM 

The confidence factor C may be interpreted as the weighted cardinality of Ihe 
normalized assignments around the main diagonal, after all the classifiers have been 
fiised. in the case of DS, the measure of confidence C is the complement (to one) of 
the measure of conflict K, i.e.: C= 1- K, where K is the sum of weights assigned to 
the empty set 

An additional feature of the present invention is the identification of cases that are 
candidates for a test set, auditing, or standard reference decision process via the 
comparison module. As illustrated previously in Fig. 1, the comparison module has 
foiff inputs. These inputs include the decision of the production engine, which 
according to an embodiment of the invention, is one of five possible rate classes or a 
no-decision (e.g., "send the case to a human underwritra*0» t-^ - 

D(FLE) = rl and rl e{Best, Preferred, Select. Standard^lvs, Standard, 
SentJoJJW) 

An additional input may comprise the decision of the fusion module, which according 
to an embodiment of the invention, is also one of five possible rate classes or a no- 
decision (e.g., "send the case to a human underwriter")* i-^- ' 

D(FUS) = r2 and r2 e{Best, Preferred, Select, Standardpltis, Standard, 
SentJoJJW} 

An additional input may comprise the degree of confidence in the production engine 

decision. The computation of the confidence measure is described in the U.S. Patent 

AppKcation Serial Nos. 10/173,000 and 10/171,575, entitled "A Process/System for 
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Rule-Based Insurance Undervwiting Suitable for Use by an Automated System," the 
contents of which are incorporated herein by reference. This measure may be equated 
to the degree of intersection of the soft constraints used by a fuzzy logic engine 
("FLE"). This measure may indicate if a case had all its constraints fully satisfied 
(and thus C(FLE) =1) or whether at least one constraint was only partially satisfied 
(and therefore C(FLE) <1). 

An additional input may comprise the degree of confidence in the fiision process. The 
normalized confidence measure C is C(FUS). According to an embodiment of the 
invention, the first test performed is to compare the two dedsions, i.e., D(FLE) and 
D(FUS). Fig. 30 illustrates all the possible comparisons between tiie decision of the 
production engine and the fusion module. Comparison matrix 3000 illustrates tiie 
D(FLE) classes 3010 and the D(FUS) classes 3020. From the table it can be seen that 
label A shows that D(FLE)=D(FUS) and Uiey both indicate the same, specific rate 
class. Furliier, label B shows that the fiision module made no automated decision and 
suggested to send the application to a human underwriter, le. D(FUS) = No Decision. 
-Label C shows that D(FLE>i=D(FUS) and that both D(FLE) and D(FUS) indicate a 
specific, distinct rate class. In addition, label D shows that D(FLE>tD(FUS), and in 
particular, that the FLE made no automated decision and suggested to send the 
q)plication to a human underwriter, vMe the Fusion module selected a specific rate 
class. Ubel E shows that D(FLE)=(FUS) and that both D(FLE) and D(FUS) agree 
not to make any decision. 

A second test may be done by using this information in conjunction with the measures 
of confidence C(FLE) and C(FUS) associated with the two decisions. With this 
information, the performance of the decision engine may be assessed over time by 
monitoring the time statistics of these labels, and the fi«quencies of cases with a low 
degree of confidence. According to an embodiment of die invention, a stable or 
increasing number of label A's would be an indicator of good, stable operations. An 
increase in the number of label B's would be an indicator that the fiision module (with 
its models) needs to be retrained. These cases might be shown to a team of senior 
underwriters for a standard reference decision. An increase in the fi«quency of label 
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C's or of cases "witii low confidaice could be a leading indicator of increased 
classification risk and might warrant furfhar scrutiny (e.g., auditii^ retraining of the 
fusion models, re-tuning of the production engine). An increase in label D's may 
demonstrate that either the production engine needs re-tuning and/or the fusion 
modules needs retraining. An increase in label E's may demonstrate an increase in 
unusual, more complex cas^, possibly requiring the scrutiny of senior underwriters. 
Thus, the candidates for the auditing process will be the ones exhibiting a low degree 
of confidence (CCFUS) < Tl), regardless of their agreement with the FLE and the 
ones for which the Fusion and the Production engine disagree, le., the ones labeled C. 

The candidfdss for the standard refoence decision proceiss are the cases for which the 
fusion module shows no decisions Oabeled B or E). The candidates to augment ihe 
test set may be selected among the cases for which ihe fusion modvile and tiie 
production engine agree (label A). These cases may be filtered to remove the cases in 
\;^ch the production engine was of borderline quality (C(FLE) < T2 ) and the cases 
in which the confidence measure of the fusion was below complete certainly (C(FUS) 
< Tl). Thresholds Tl and T2, may be data dependent and must be obtained 
empirically. By way of example, T1=0. 15 and T2=l. Table 2 below summarizes the 
conditions and the quality assurance actions required, according to an embodunent of 
the invention. Dashes ("-') in tiie entries of the table may indicate that the result of 
the confidence measures are not material to the action taken and/or to the label 





uontiggnce Megsuies 


1 ACIION 


Label from 
Table? 


C(FLE) C(FUS) 


A 
B 
C 
D 
E 


<T3 


Candidate to be added to data set for tuning of FLE 

Candidate for Stand Ref Dec. Process. 
After enough cases are collected. re4une the classifiers 

Candidate for Auditing 

Candidate for Stand Ref Dec. Process. 
After enough cases are collected, re-tune the classifiers 

Candidate for Stand Ref Dec. Process. 
After enough cases are collected, re^e the classifiers 

Candidate for Auditing 



applied. 
Table 2 
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According to an embodiment of the invention, the fusion module m^ be 
implemented using software code on a processor. By way of an example of the 
results of an implementation of the present invention, a fusion module was tested 
against a case base containing a total of 2,879 cases. After removing 173 UW cases, 
the remaining 2,706 cases were segmented into 831 nicotine users, with three rate 
classes, and 1,875 non-nicotine users, with five rate-classes. These cases were then 
used to test the fusion process. Because the cases for which the production engine 
had made no decision were removed, use of a comparison matrix similar to the one of 
Table 1400 will only have labels A, B, C. The fusion was performed using the T- 
mTmT2(x,y)=x*y. 

Fig. 31 illustrates the effect of changing tiie threshold Tl on the measure of 
confidence C, were 0 ^ C ^1. Table 3100 display decisions 3110, confidence 
thresholds 3120 and the case distributions 3130 based on the confidence threshold 
3120. Each column shows the number of cases whose measure of confidence ^is ^ 
77. As the threshold is raised, the number of "No Fusion Decision" increases. A *TS[o 
Fusim Decision" occurs-when-the-residts-of-the-fusion-are-deemed too weak to be 
used. When the threshold T is 1, no case is rejected on the basis of the measure of 
conflict This leaves 36 cases for which no decision could be made. As the threshold 
is decreased, decisions with a high degree of conflict are rejected, and the number of 
"No Fusion Etedsions" increases. 

"Agreements" occur when the fused decision agrees with the FLE and with the 
Standard Reference Decision (SRD). "False Positives" occur when the fused decision 
disagrees with the FLE, which in turn is correct since the FLE agrees with the 
Standard Reference Decision ("SRD"). "False Negatives" occur y/hea the fused 
decision agrees with the FLE, but both the fusion decision and the FLE are wrong, as 
they disagree with the SRD. "Corrections" occur when the fused decision agrees with 
the SRD and disagrees with the FLE. Finally, "Complete Disagreement" occurs when 
the fiised decision disagrees with the FLE, and both the fused decision and the FLE 
disagree with the SRD. Further, similar results were obtained for nicotine uses, and 
these results are illustrated in Fig. 32, with table 3200 displaying decisions 3210, 
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confidence thresholds 3220 and the case distributions 3230 based on the confidence 
thresholds 3220. 

Fig. 33 illustrates a Venn diagram 3300 illustrating the situation for the threshold Tl 
= 0.15 (/.e., for (> 0.15) for the non-nicotine users, vMLe Fig. 34 illustrates a Venn 
diagram 3400 illustrating the situation for the toeshold Tl = 0.15 (i.e., for (> 0.15) 
for the nicotine users. In the case of the non-nicotine users (for Tl = 0.15) the 
following labels result: 

A: 1,588+27 = 1,615 (86.13%) in which 3310 D(FUS)=D(FLE); (e.g., 

agreements 33 1 0 and false negative 3320) 

B: = 36 (1 .92%) in which the fusion did not make any decision (&om C =0); 

CI: 212-36 = 176 (939%) in which the fusion was too conflictive (C <0.15); 
and 

C2: 22+25+1= 48 (2.56%) in which p(FUS)#D(FLE) (e.g., false positive 
3330, corrections 3340 and complete dise^ements 3350). 

In the case of the nicotine users (for Tl = 0.15), the following labels result: 

A: 729+15 = 744 cases (89.5%) in which D(FUS)=D(FLE); {e.g., agreements 
3410 and &lse negatives 3420); 

B: = 37 cases (4.5%) in which the fusion did not make any decision 

(fix)mC=0); 

CI: 68-37 = 31 cases (3.7%) m which the fusion was too conflictive 
(C<0.15);and 

C2: 16 +3 = 19 cases (2.3%) in which D(FUS)?tD(FLE) (e.g., false positives 

3430, corrections 3440 and complete disagreements 3450). 

According to the present example, since there is no SRD in production, there can only 
be reliance on the degree of conflict and the agreement between the fused decision 
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and the FLE, If the disagreement between FLE and FUS (e.g., subset C2) is used, it 
can be observed that the number of cases in which the fusion will disagree with Hie 
FLE, and make a classification, is 48/1875 (2.56%) for non-nicotine users and 19/831 
(2.3%) for nicotine users. This may be considered a manageable pox^entage of cases 
to audit Further, this sample of cases may be augmented by additional cases sampled 
fiom subsets CI. 

A further analysis of set C2 in the case of non-nicotine users shows that out of 48 
cases; the fusion module called 22 of them correctly and 26 of them incorrectly. 
From the 26 incorrectly called cases, 14 cases were borderline cases according to the 
FLE. This illustrates that the problematic cases may be correctly identified and are 
good candidates for an audit. 

A fiirther analysis of set C2 in the case of nicotine users shows that out of 19 cases, 
tiie fusion module incorrectly called 16. Of these 16 cases, 6 cases were borderline 
cases, /.e , the FLE only had partial degree of satisfiaction of 4e intersection of all tiie 
constraints C(FLE)<0.9. Furthermore, 11 cases had a conflict measure C <0.4. 
if title union of these two subsets (e.g., the bord«line cases andlhe conflict measure 
cases) is taken, the results ace 13 cases tiiat are either borderline (from the FLE) or 
have low confidence m the fusion, and the remaining 3 cases were ones that the CBE > 
could not classify (i.e., it could not find enough similar cases). This again 
demonstrates that the problematic cases may be generally conecfly identified and are 
wortii auditing. 

The set B (4.5%) illustrates a lack of commitment and is a candidate for a review to 
assign an SRD. The set A may be a starting point to identify tiie cases that could go 
to the test set However, set A may need further filtering by removing all cases that 
were borderline accordmg to the FLE (ie., C(FLE)< T2), as well as removing those 
cases v^^ose fusion confidence was too low (i.&, C(FUS) <1). Again T2 will be 
determined empirically, from the data. 

Various aspects of the fiision module will now be discussed in greater detail below. It 
is understood that various portions of the fusion module, as well the dififerent aspects 
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described below, may be performed in different manners wi&out departing from tiie 
scope of the invention. 

2. Outlier Detector 

One component of a fusion module may be determining outlier applications. 
According to an embodimait of the invaition, it may be desirable to detect all 
classification assignments to applications, sucb as insurance applications, tiiat are 
inconsistent and therefore potentially incorrect Applicatioins that are assigned these 
inconsistent labels may be defined as outliers. The concept of outliers may extend 
beyond the realm of insurance underwriting and be intrinsic to all risk classification 
processes, of yMdh the deteraiination of the proper premium to cover a given risk 
(Le., insurance underwriting) is just an example. Therefore, the ultimate domain of 
this invention may be considered risk classification, with a focus on insurance 
underwriting. 

According to an embodiment of the iavaition, the existii^ risk structure of the risk 
— classification- problem- -is-exploited_fix)m_the_risk_a^gmaffits_jg^ by the 
underwriters, similar to the dominance-based classifier described in greater detail 
below. But whiereas the dominance based classifier uses the risk structure to produce 
a risk assignment for an unlabeled application, the outlier detector examines the risk 
structure to find any applications that might have been potentially assigned an 
inironect risk assigmaent by the undarwriter. 

The outliCT detector may add to the rationality of the overall underwriting process by 
detecting globally inconsistent labels and bringing it to the attention of human 
experts. Many papers in the decision sciences demonstrate that in the presence of 
information overioad, humans tend to be boundedly rational and often, 
unintentionally, violate compelling principles of rationality like dominance and 
transitivity. The outlier detector may attempt to counter these drawbacks exhibited by 
human decision-makers and make the decision-making process more rational. As a 
result, the risk assignments can be expected to be more optimal and consistent. 
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Further, by biinging these globally inconsistent risk assignments to the attention of 
the underwriters, the system may gain knowledge about exceptional decision rules, or 
additional features that are unplicitly used by ejqperts and which may be left 
unmentioned durinjg the initial design stages of an automated system. This additional 
knowledge may be used to improve the performance of any automated system. Thus, 
the outlier detector may also act as a knowledge-eliciting module. 

By removing globally inconsistent risk assignments from the initial set, the detection 
of outliers may further improve the perfomiance and simplicity of other supervised 
classification systems, such as neural networks and decision-tree classifiers when 
used as the primary automated system. This is because the presence of global 
inconsistencies may add to the "non sq3arability" of the feature space, vsdiich will 
often lead to either inferior learning, or very complicated architectures. As the outlier 
detector reduces the number of global inconsistencies, a cleaner, more consistent 
training set may be expected to result in a better learning, and by a simpler system. 
Hence, the outlier detector may improve ihe classification accuracy, and simplicity of 
other automated systems. 

Because the outlier detector uses the principle of dominance to capture the risk 
structure of the problem, the outlier detector has explanation capability to account for 
its results. This is because dominance is a compelling principle of rationaUty and thus 
the outliers detected by the system are rationally defensible. 

According to an embodiment of the invention, the ftmctionality of the outUer 
detection system may be generic, so that it can be used to detect outliers for any 
preference-based problem where die candidates in question are assigned preferences 
based on the values that they take along a common set of features, and the preference 
of a candidate is a monotonic fimction of its feature-values. Therefore, the 
applicability of an outUer detection system transcends the problem of insurance 
underwriting, and can be easily extended to any risk classification process. 

In many domains where expert opinions are used to score entities, tiie set of entities 
that have already been scored are stored as precedents, cases, or reference data points 
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for use in future scoring or comparison with new candidates. The outlier detector can 
help in ensuring that any new candidate case that goes into the reference dataset will 
always lead to a globally consistent dataset, thereby ensuring that the reference 
dataset is more reliable. 

According to an embodiment of the invention, an outlier detector may exploit tiie 
existing risk structure of a decision problem to discover risk assigrauents that are 
globally inconsistent. The technique may work on a set of candidates for which risk 
categories have already been assigned (e.g., m the case of insurance underwriting, for 
example, this would pertain to the premium class assigned to an application). For this 
set of labeled candidates, the system may find all such pairs of {^plications belonging 
to different risk categories, which violate the principle of dominance. The outlier 
detector attempts to mateh the risk ordermg of the applications with the ordering 
imposed by dominance, and use any mismatch during tins process to identify 
applications that were potentially assigned incorrect risk categories. 

As described previously, automating an insurance underwriting process may involve 
trying to emulate the reasoning used by the human expert while assigning premium 
classes to insurance applications, and finding computable fimctions that capture those 
reasoning principles. According to an embodiment of the invention, the risk cat^ory 
of an application depends upon the values taken by the application along various 
dunensions, such as Body Mass Index CBMP*), Cholesterol Level, and Smoking 
History. The values of the dimensions are then used to assign risk categories to 
insurance applications. An automated system would operate on these same features 
while trying to emulate the underwriter. Typically, the risk associated with an 
application changes with changes to the magnitude of tiie individual features. For 
example, assuming that all other features remaining the same, if the BMI of an 
applicant increases, the application becomes riskier. The outlier detector uses this 
knowledge to detect all such applications that do not satisfy tiie principle of 
dominance. 

According to an embodimait of the invention, tiiere is a monotonic non-deo-easing 
relationship between all the feature-values and tiie associated risk (e.g., higher values 
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imply equal-or-higher risk). Variables that do not meet this relationship may be 
substituted by their minor image, which will then satisfy this condition. For instance, 
let us assume that the relevant medical information for a non-smoker ^licant is 
captured by the followii^ five variables: 

XI = Cholesterol, 

X2 = Cholesterol Level. 

X3 = Sj^olic Blood Pressure, 

X4- Diastolic Blood Pressure, 

X5 = Years since quitting smoking (if tgyplicable). 

MortaUty risk is monotomcally non-deceasing with respect to the first four variables, 
meaning that such risk can increase (or remain the same) as the values of the four 
variables increase. However, higher values in the fifth variable have a positive effect, 
as they decrease the mortality risk. Therefore, the fifth variable needs to be 
transformed into another variable. By way of example, X5 may be transformed into 
X5', where X5' is defined as X5'=K - X5 = K-"years since quitting smokmg". K is a 
constant, e.g., K=7, so that higher values of X' will reflect same or increased 
mortality risk. Otiier relationships between aU tiie feature-values may also be used. 

Further, if two insurance applicants A and B are compared where applicants A and B 
are identical along all features, except that tiie appUcant B has a higher BMI tiian A, 
then the risk associated with applicant A cannot be greater than tiiat associated with 
B. In other words, the premium assodated to the rate class assigned to A should not 
be higher than that one assigned to B. The above reasoning principle is referred to, in 
decision tiieory, as the principle of dominance and in iJie above example applicant A 
dominates applicant B. The terminology dominates(A,B) is used to c^rture this 
relation between applicant A and applicant B. 

For example, given two applications A and B, it can be said that application A 
dominates application B if and only if appUcation A is at least as good as application 
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B along all Ihe features and Ihere is at least one feature along which application A is 
strictly better than appUcation B. The dominates relation may be based on the above 
definition of dominance. It is a trichotomous relation, meaning that given two 
appUcations A and B either appUcation A dominates appUcation B, ^pUcation B 
dominates application A, or neither dominates the other. In the case where neither 
^pUcant dominates the other, each appUcation may be better than its counteipart 
along different features. In such a case, ^pUcation A and appUcation B may be said 
to be dominance-tied. For example, as iUustiated in Table 3 below, assume Ihere are 
three appUcants A, B, and C with the following feature values: 



Application 


BMI 


Cholesterol 


BP_sys 


A 


25 


255 


115 


B 


26 


248 


120 


C 


24 


248 


112 



Tables 

Assuming for simpUdty that these are the only three features used to assess the risk of 
an appUcant By the definition, it can be seen that ^Ucation C dominates boUi 
appUcation A and appUcation B, since appUcation C is at least as good (eg., as low) 
as appUcation A and appUcation B along each feature, and moreover there is at least 
one feature along which appUcation C is strictly better (e.g., strictly lower) than both 
^pUcadon A and ^Ucation B. However, ^Ucation A and appUcation B are 
dominance-tied since each is better (e.g., lower) than the other along some feature 
(appUcation A has better cholesterol value while appUcation B has better BMI value). 

According to an embodiment of the invention. Hog relation No_Riskier_Than(A,B) is 
true if the risk associated with appUcant A (say ta) is no higher than that associated 
with appUcant B (say rs), /-c. 
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NoJtiskier_Than(A,B)<^iTA^TB). 

According to an embodiment of the invention, based on the assunq)tion that the risk 
associated with an applicant is a monotonic non-decreasing function of the feature 
values, it can be seen that for any pair of insurance appUcations, if the dominates 
relation holds between the two applications in a certain direction (e.g., application A 
dominates q)plication B), then the No_Riskier_Than relation will also hold in the 
same direction (e.g., application A is No_RisMer_Than appUcation B). In other words, 
the dominates relation is a sufficiency condition for the No_Riskier_ Than relation. 
Thatis: 

dommates(A.B)->NoJ.isMerjrhan(A,B). 

An application may be considered an outlier based on one or more characteristics. 
According to an embodiment of the invention, application X and application Y are 
marked as outiiers if appUcation X dommates application Y, and appUcation X is 
assigned a risk category that associates greater risk with appUcation X compared to 
appUcation Y. According to an embodiment of the invention, appUcation X and 
appUcation Y are marked as outiiers if appUcation Y dominates appUcation X, and 
appUcation Y is assigned a risk category tiiat associates greater risk witii appUcation 
Y compared to appUcation X. 

TBe above statements can be described formaUy with tiie foUowmg equation: 
p{:,Y are outiiers) ( dominates(X,Y) a ( rx > ry) ) 
v( dominates(YPQ a (ry > rx) ) 

As can be seen, from the definitions of the dominates relation and the 
No RisUer Jhan relation, inconsistent risk assignments may be identified. If 
appUcation X dominates ^Ucation Y, tiien appUcation X wiU be at least as good as 
appUcation Y along aU features and strictiy better tiian q)plication Y along at least 
one feature As a result, logicaUy, appUcation X cannot be riskier than appUcation Y. 
Therefore, if tiie risk assignments made by the underwriters are such tiiat appUcation 
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X is categorized as being riskier than appUcation Y, then the existing risk assignments 
made to appUcation X, and appUcation Y, or to both appUcation X and ^pUcation Y, 
may likely be logically infeasible. Therefore, both appUcation X and appUcation Y 
are labeled as outUers, e.g., applications that have inconsistent assignments, and 
therefore potentiaUy incorrect risk categories. According to an embodiment of the 
invention, in order to exploit the presence of the dominance relation between two 
appUcations and to logically restiict the risk assignment of tiie two appUcations, it 
may be necessary to ensure that all the features tiiat are being used by liie experts 
during the risk assignments are also used during the dominance comparisons. 

The steps involved in outiier detection according to an embodiment of the invention 
are described below and shown in Fig. 35. An outiier module operates on a set A of 
^Ucations, each of which has been assigned a risk category from one of tiie i 
possible categories. The system may be thought of as operating on a set of tuples 
{(Ajpc)} v^ere jc is the risk category assigned by the underwriter to appUcation Aj. 
The process for outiier detection may be implemented in pseudocode as set fprth 
bdow: 



Outlier_detect(A:{Aj^}) 
{ 

for each tiqile (Apx)eA 

{ 

for each ttqjle (Ahy) eA where r^r^, 
{ 

if (dommates(AhAj}) 
mark Aj, At as outliers; 
break; 

ebe 

nextAk; 

} 

nextAfi 
} 

Report set qf outUers; 
} 



As defined earUer, outiiers are pairs of tuples (Appc), (A,^y) where Ap dominates A, 
but ry < Tx. Fig. 35 iUustrates a flowchart for detecting outiiers given a set of labeled 
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appUcations. At step 3510, a tuple (Aipc) is identified. A ft^le (Aj,y) is identified at 
step 3520, where the rate class ty for tuple (Aj,y) is greater than the rate class r*. At 
step 3530, a determination is made whether tiq)le (Aj,y) dominates tuple (AfaX) (e.g.. 
Dominates iiAj,i)y Ifyes, tuples (A,pc) and (Aj,y) are marked as outHers. The system 
then determines at step 3550 if there is another tuple (Aj,y), where ry > tx- This 
determmation is also made if tuple (Aj,y) does not donainate tuple (A|,x). At step 
3550, if there is another (Aj,y), where ry > r,, the process returns to step 3520. If there 
is no other tuple (Aj,y) where r, > r» a determination is made at step 3560 whether 
there is another tuple (A|,x). If yes, the process returns to step 3510, while if not, the 
system ends at 3570. 

According to an embodiment of the invention, an outlier detector may be 
implemented in software code, and tested against a database of cases. For example, 
an outlia- detector may be tested against a database of approximately 2,900 cases. In 
such an example, the outlier detector identified more than a dozen of subsets 
containing at least one inconsistencgr. The results produced by the outlier detector in 
-this example are shown in Table 4 below, along with a few relevant feature values. 





Curr 






Risk 


Smoking 




Fann_ Fan>;. 


Class Age 


Height Weight BP^Sys BP_Dias Cholesterol Choi.Ratio SGOT SGPT GGT Status 


BuUd 


Hist Death 


PREF 63 


"62 146 112 80 258 4.1 21 16 17 0 


26.70 


0 0 


BEST 29 


77 229 132 84 278 4.6 26 22 17 0 


27.16 


0 0 



Table 4 

In Table 4 above, each row represents an insurance application for which the risk 
classification had already berai determined, as shown in the first colunm. The risk 
class "BEST' is a lower risk class compared to the risk class "PREF." A person 
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classified in the "BEST' risk class will have to pay a lower premium than a person 
classified in the "PREF' class. Yet, it can he seen that the application indicated in the 
row first of Table 4 dominates the application of the second row. In the present 
example, upon sending these two applications to human underwriters for 
reconsideration, the risk classifications for the appUcations were reversed. This 
single example illustrates the use of an outlier detector to obtaui more consistent risk 
asagmnents. 

As illustrated in Fig. 1 above, outlier detector 180 is shown after the fusion to insure 
that any new addition to the best-cases database would be dominance-consistent with 
the existing cases. Another potential use for the outlier detector is its plication to 
the training-cases database used to train each of the decision engines used by the 
fusion module. This is a Quality Assurance step for the training data to insure that the 
training cases do not contain outliers (e.g., inconsistent cases in the dominance sense) 
so as to imjwove the leammg phase of the four models illustrated (CBR, NN, MARS, 
Dominance) before tiiey are used as run-time classifiers for the Quality Assurance 
process of the production engme. According to an embodiment of the invention, as 
illustrated in Fig. 36, an outlier detector 3610 and a training case-base 3620 may be 
positioned for quality assurance for CBR DE 3630, MARS DE 3640, NN DE 3650 
and DOM DE 3660, the output of which is fed into a fusion module (not shown). 

3. Dominance Classifier 

According to an embodiment of the mvention, the risk structure of an underiying 
problem may also be exploited to produce a risk category label for a given 
application, such as an insurance application. This risk classification can be assured 
to be accurate with a high degree of confidence. Specifically, as described above in 
relation to the outiier detector, the application of a dominance classifier may also 
provide risk assignments havmg a high confidence measure. Further, wten strict 
definitions are implemented, tiie relative accuracy of the system approaches 100%, 
thus TPi"^"'^^'"E the degree of mismatch between the risk assigmnent made by a 
human undawriter and the automated rate class decisions. 
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A dominance classifier may have many of the advantages of the outlier detector. The 
principle of dominance is a compelling principle of rationality and thus the 
classification produced by the technique is rationally defensible. This imparts 
explanation capability to the classification making it transparent and easy to 
comprehend. Further, there are no iterative runs involved in tuning. As a result, tiie 
tuning process may reduce and become less time-consuming. The output of this 
dominance-based classifier can be combined in a fusion module with the output(s) 
generated by other classifiers. A fusion process may be used for quaUty assurance of 
a production decision engine, to provide a stronger degree of confidence in the 
dedsion of the engine, in the case of consensus among the classifiers, or to surest 
manual audit of the plication, in the case of dissent among the classifiers. 

According to an embodiment of the invention, automating an insurance application 
underwriting process may essentially involve trying to emulate Ihe reasoning used by 
a human expert while assigning premium classes to insurance appUcations, and 
finding computable functions that capture those reasoning principles. The risk 
category of an appUcation depends upon the values taken by the application along 
various dimensions, such as, but not limtedntorbodymassindexCBMI), cholesterol 
level, and smoking history. An underwriter makes use of these values to assign risk 
categories to the applications. Hence, an automated system should operate on tiiese 
same features while trying to emulate the underwriter. Typically, the manner in 
which the risk associated with an insurance application changes with changes to the 
magnitude of the individual features is also known. For example, when all other 
features in an insurance application remain the same, if the BMI of an applicant 
increases, the application becomes riskier. 

A dominance-based risk classification may use this knowledge to generate a risk 
category for a given application, such as an insurance ^plication. According to an 
embodiment of the invention, an assumption may be made that there is a monotonic 
non-decreasing relationship between all the feature-values and the associated risk 
higher values imply equal-or-higher risk). For those variables that do not meet 
this relationship, a mirror image may be substituted, which will then satisfy this 
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condition that lower values correspond to lower risk. This can be seen with reference 
to Table 3 regarding the outlier detector. 

Further, as discussed above with respect to the outlier detector, the relation: dominates 
(A,B)-yNoJtiskiiBrThan(A,B)sdllho\6s 

The tenn Bounded_mthm(B,{A,C}) may be used when aH)Kcation B is 
bounded_within application A and appUcation C, if and only if appUcation A 
dominates application B and application B dominates application C, i.e., 

Bounded_within(B, {A. Q) <^ dominates(A. B) a dommates{B.C). 

This relation may tiien be read as "B is bounded within A and C." 

If application B is bounded within two appUcations A and C, and if the risk categcay 
assigned to applications A and C is the same, tiien tiie risk category of application B 
has to be the same as that of applications A and C. i.e., 

— .Bounded-mthin(B,{A,C}).A.(rA^c^)^(rB^) 

To better demonstrate tiiis, suppose die following is present: 

Bounded_within(B,{A.C}) /^(rA = rc='T). 

Thi'simplies tiiat 

dominates(A, B) a dominates(B, Q a (rA = rc = r). 

Or, 

No_Riskier_Than(A;B) a No_Riskierjrhan(B,C) a (rA = rc = r). 
Based on tiie definitions of the relation, the above can be rewritten as, 

(rA ^ tb) a (rs ^ rc) a (rA = rc = r). 
hi otiier words. 
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thereby demonstrating the principle of dominance based risk classificatioiL 

This principle may serve as the basis for a risk classification. For any given 
plication B with unassigned risk category, a determination is made whether there 
exist two applications A and C such that the Left Hand Side (LHS) of the principle is 
satisfied, i.e., Bounded_mthm(B,{A,C}) a (rA = rc = r). If this occurs, the risk 
category of application B is assigned to be the same as that of applications A and C. 

Even if an application A dommates another application B, the two applications may 
still be quite close in terms of their feature-values so that they belong to Ihe same risk 
category. In other words, it may be expected for the dominates relation to hold 
between some pairs of applications even if the two applications belong to the same 
risk category. This may mean that fiirlher partitions of the applications within a risk 
category may be made, such as into the best, non^ominated subset and worst, non- 
dominating subset. 

According to an onbodlment of the invention, the best; non-dominated subset for a 
given ride category may be defined as the one that contains all such applications tiiat 
are not dominated by another application within that risk category. This may also be 
referred to as the Pareto-best subset 

According to an embodiment of the mvention, the worst, non-dominating subset for a 
given risk category may be defined as the one that contains all those plications that 
do not dominate even a single application in that risk category. This may also be 
referred to as the Pareto-worst subset 

To visualize these two subsets geometrically. Fig. 37 may be referred to, which shows 
a plot of features fl 3710 and £2 3720 for 1000 insurance applications. The insurance 
applications are plotted as points in tibie 2-dimensional feature space. For sunidicity, 
assume that these are the raoly two features used vviiile assignii^ a risk cat^ory to the 
applications, and that the lower values along a feature correspond to a Iowa: risk. In 
Fig. 37, circles denote the Pareto-best subset 3730 vfUle the squares denote the 
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Pareto-worst subset 3740. The circles take the lowest {e.g., the most desu-able) values 
along both features while the squares take on the highest (e.g., the least desirable) 
values. In addition, using the definition of tiie Pareto-best subsets 3730 and the 
Pareto-worst subsets 3740 as set forth above, each of the remaining insurance 
applications is such that at least one application represented by a circle dominates it, 
and it dominates at least one application represented by a square. In other words, for 
each point X that is not in the Pareto-best subset(0) 3730 or in the Pareto-worst 
subset(P) 3740 in Fig. 37, there is at least one square S and one circle C such that 
Bounded_within(X,{C,S}) is true. For example, suppose that every circle and square 
in Fig. 37 representing an application was assigned the same risk cat^ory r. Then, by 
applying the principle of dominance-based risk classification, all the points shown in 
Fig. 37 can be assigned the risk category r as well. 

According to an embodiment of the invention, the production of the two subsets O 
and P is identical to the production of the dominance subset in discrete alternative 
decision problems. By way of example, articles by Kung, Luccio, and Preparata 
(1975), and Calpine and Golding (1976), the contents of which are incorporated 
herein by reference, present algorithms which can create these subsets in 0(« . log "" 
^(n)) time, where n is the number of candidates involved and m is the number of 
features along which the dominance comparisons are being done. Hence, for an 
underwriting problem with r risk categories, there may be 2r such subsets, or one pair 
.for_eachjri^ category representing the risk surfaces that form the upper bound and the 
lower bound. 

According to an embodiment of the invention, an algorithm may produce the 
Dominance subset for a given set of altematives X(n,m) where n is the number of 
candidates and m is the number of features used. The term Dominance(X4c) may be 
used to indicate the application of such an algorithm to the set X(n,m), where k is 
either +1 or -1, depending upon wheAer higher or lower feature values are desired to 
be considered as better during dominance comparisons. According to an embodiment 
of the invention, two principal modules, die tuning module and the classification 
module, may be used. The tuning module may compute the Pareto-best and Pareto- 
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worst subsets for each risk category. The Classification module may use the results of 
tiie tuning to classify new applications. 

The tuning module may use the Dominance algorithm to compute the Pareto-best and 
the Pareto-worst sets for each risk category. Given a set of plications A, such as 
insurance applications that have been partitioned into / different risk categories by the 
underwriter, tuning may use the pseudocode set forth below: 

TUNE(A,i){ 

for each risk category ri 

{ 

Compute and store the indices of the Pareto-Best subset 0(rj). 

Obtain the Dominance(A) enforcing that lower feature- 
values are better. 

Compute and store the indices of the Pareto-Worst subset P(ri). 

Obtain tihe Doniinance(A) enforcing that higher feature- 
values are better^ 

} 

Fig. 38 is a flowchart illustrating the steps involved in the tuning process according to 
an embodiment of the invention. At step 3800, each separate risk category is 
determined. At step 3802, a set of appUcations A is divided into the different risk 
categories. At step 3804, the Pareto-best subset of the applications within eadi risk 
category is computed. At step 3806, the Pareto-best subset is stored. At step 3808, 
the Pareto-worst subset of the applications within each risk category is computed. At 
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step 3810, the Pareto-worst subset is stored, completing the tuning process at step 
3812. 

The classification module may use the sets O and P fiom the tuning process to assign 
risk classifications to new appUcations. According to an embodiment of the 
invention, the classification module assigns a risk category to any new appUcation by 
checking if a given appUcation satisfies the Boundedjvithin relation with respect to a 
Pareto-best, and another Pareto-worst appUcation for a given rate class. According to 
an embodiment of fte invention, given a set of unlabeled applications, U, and the 
Pareto-best subsets and the Pareto-worst subsets obtained for each of the i risk 
categories from tuning, each appUcation m U is assigned a risk category. Assignment 
of a risk category may be carried out according to the pseudocode set forth below 
using the principle of dominance based risk classification: 

Fig. 39 illustrates the steps involved in the classification process accordmg to an 
embodiment of the invention. At step 3902, an appUcation is selected from U. At 
step 3904, a risk category ik is selected. At step 3906, a determination is made 
whether appUcation Z is bounded wito some xeO(rk), yeP(ik). If not, a 
determmation is made if there is another risk category ik, at step 3908. If there is 
another ric the process returns to step 3904. If tiiere is no oflier rk, appUcation Z is 
declared unresolved at step 3910, and a determination is made if there is another 
application Z at step 3912. If there is another appUcation Z, the process returns to 
stqp 3902. If there is no other appUcation Z, the process ends at step 3916. 

Returning to step 3906, if appUcation Z is bounded, risk category r^ is assigned to 
appUcation Z at step 3914. The process then moves on to step 3912 to determine if 
there is ano&er application Z. 

When assigning a risk category, such as according to the pseudocode steps illustrated 
previously or according to the steps of Fig. 39, there may be situations that need to be 
accounted for m the above risk assignment algorithm. One example is where there is 
no risk category for which the Bowidedjmthin condition is satisfied for AQ]. 
Another example is where there are at least two risk categories for viiich the 
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Boundedjmthin condition is satisfied for Alj]. Each of the above two situations can 
lead to a different kind of ambiguity. Other situations may also lead to various types 
of ambiguity. 

Accordii^ to an embodiment of the invention, where there is no risk category for 
which tiie Boundedjmthin condition is satisfied for AQ], an application may be 
regarded as ambiguous by the system. No risk category is assigned to the application 
and the ^pUcation is marked as unresolved. 

The comparison matrix 4000 illustrated in Fig. 40 provides an example of the 
performance of the system for a particular set of applicants. In the example illustrated 
in Fig. 40, the system ioitially used the tuning set m order to compute the Pareto-best 
and the Pareto-worst subsets for each of the risk categories, whidi in tiiis case are 
eight risk categories. The system may then classify a set of {plications that were not 
in the tuning set For these applications, risk assignments were also obtained item the 
human underwriters. This allows a comparison of the performance of the system with 
that of the experts using Ae comparison matrix. 

As mentioned earlier, an application that does not satisfy the Boundedjmthin relation 
for any of the risk categories, is marked as unresolved by the system. These 
^plications are shown in the column 4002 labeled "UW." As can be seen, quite a 
large number of applicants were marked as unresolved by the system. Howevo:, for 
the applications that were assigned'a^sk'caEegOTy'by the system, the system was 
accurate 100% of the time. Thus, 52 plications were correctly classified m column 
4004 labeled "PB," 22 applications were correctly classified in column 4006 labeled 
"P," 16 applications were correctly classified in column 4008 labeled **Sel," 10 
appUcations were correctiy classified in column 4010 labeled "Std+," 3 applications 
were correctly classified m column 4012 labeled "Std," 28 applications were correctiy 
classified m column 4014 labeled "P Nic," 8 pUcations were correctiy classified in 
column 4016 labeled "Std+Nic," and 3 applications were correctiy classified in 
column 4018 labeled "Std Nic." Hence, tiie principle of dominance based risk 
classification presented iu this letter has the potential to produce risk assignments with 
a high degree of confidence. For the few applications that are misclassified above, the 
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use of another system called the dominance based outlier detection system may be 
used. The dominance based outlier detection system has been described above. 

As can be seen from the example of Fig. 40, the classifier is 100% accurate, but may 
have a lower coverage, meaning that it does not provide a decision for a large number 
of cases. A different tradeoff may be achieved between relative accuracy and 
coverage of the system by allowing a minor relaxation of the classification rule used 
in the extreme rate classes (e.g., the best and worst rate class). According to an 
embodiment of the invention, one type of modification makes use of the fact that 
since the risk categories are totally ordered, the principle of dominance-based risk 
classification can be relaxed for the best and the worst risk categories. This relaxation 
may therefore be expected to improve the coverage of the automated system. The 
basis for this relaxation principle may be seen &om understanding that if the 
application for applicant X dommates the application for applicant A such that the 
risk category assigned to application A is the best risk category for tiie problem, say 
ibesb then the risk category of application X is also rbcsb • 

dominates(XJL) a (rA = rtesd -^(j^^ ^besd- 

For example, assxraie that there is an application X such that it dominates application 
A, where it is known that A is assigned the best risk category, Le. : 

rA'^^rbest 

Since application A belongs to the best risk category, no other applicant can be 
assigned a better risk category than application A. In other words, 

rX^rA 

However since application X also dominates application A, application X can be no 
riskier than application A which implies that: 

rX<rA 

From this, it can therefore be inferred that: 
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rX-rbest 

thereby demonstrating the appUcabihty of the relaxation condition described above 
with respect to the best classification. Further, the relaxed principle of dommance 
based risk classification for the worst risk category can be seen by notmg that if 
application A dominates application X such that the risk category assigned to 
appUcation A is the worst risk category, say rwist, then the risk category of appHcation 
Xisalsorwonb^^-: 

For example, assume that there is an application X such that it is dominated by 
appUcation A, where it is known that A is assigned the worst risk category. i.e.: 

Because application A belongs to the worst risk category, every other ^Hcant 
belongs to a risk category that is better than or equal to that of application A. In other 
words: 

rx ^ TA 

However, since application A also dommates appUcation X, therefore appUcation A 
-mustalsG-be-noriskier.than.appUcatioBXj^i^mpii^ 

rx ^TA 

From this, it is demonstrated that: 

Vx ~ ^vont 

thereby demonstrating the appUcabiUty of the relaxation condition described above 
with respect to the worst classification. Thus, according to an embodiment of the 
invention, the steps for classification remam tiie same except that during the r*-lopp in 
Fig. 39, the appUcation at hand is tested for the relaxed conditions described above 
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respectively, and assigned the risk category accordingly if one of the conditions is 
satisfied. 

The comparison matrix 4100 shown in Fig. 41 illustrates perfonnance of the 
dominance based risk classifier used after incorporating the relaxed conditions 
defined above, during classification of an applicant and tested against a case base of 
approximately 541 cases. Coverage of the classifier has unproved, smce 68 
applicants that were initially marked as unresolved by the classifier are now assigned 
a risk category. Whereas the relative accuracy of the new classifier is not 100% hke 
its counterpart, the number of misclassifications is relatively few. In other words, for 
a large gain m coverage the overall drop m accuracy obtained by the use of the 
modified classifier may be relatively minor. Thus, the relaxation conditions may 
permit a tradeoff between accuracy and coverage of the dominance based risk 
classifier. Where the relative accuracy is more unportant for a problem, tiie earlier 
version of the classifier may be used. On the otiier hand, if some problem requires 
that more applicants be assigned a risk category, tiien it may be more desirable to use 
the modified classifier. This imparts flexibility to the system on the whole since it can 
cater to varying requirements of accuracy and coverage from the automated system, 
which is an added advantage of the system. 

4. Multivariate Adaptive Regression Splines 

According to an embodhnent of the mv^oo^li iietwork of multivariate adaptive 
regression splines ("MARS") based regression models may be used to automate 
decisions in business, commercial, or manufacturmg process. Specifically, such a 
method and system may be used to automate tiie process of underwriting an 
appUcation as applicable to the insurance business. 

According to an embodiment of the invention, a MARS based system may be used as 
an alternative to a rules-based engme ("RBE"). U.S. Patent Apphcation Serial Nos. 
10/173,000, filed on June 18, 2002, and 10/171,575, filed on June 17, 2002, titled "A 
Method/System of Insurance Underwriting Suitable for Use By An Automated 
System," the contents of which are incorporated herein by reference m then: entirety, 
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describe a fuzzy rule-based system. A MARS model may not be as transparent as 
other decision engines (e.g., "RBFO, but may achieve better accuracy. Therefore, 
MARS may be used as an alternative approach for a quality assurance tool to monitor 
the accuracy of the production decision engine, and flag possible borderline cases for 
auditing and quality assurance analysis. Further, a MARS module may be a 
regression-based decision system, which may provide the simplicity of 
implementation of the model since it is based on a mathematical equation that can be 
efficiently computed. 

According to an embodiment of the invention, a MARS module may fecihtate the 
automation of the "clean case" (e.g., those cases with no medical complications) 
underwriting decision process for insurance products. A MARS module may be used 
for other applications as well. A MARS module may be used to achieve a higih 
degree of accuracy to minimize mismatches in rate class assignment between that of 
an expert human underwriter and the automated system. Further, the development of 
a parallel network of MARS models may use a set of MARS models as a classifier in 
a multi-class problm. 

The MARS module is described in the context of a method and system for automating 
the decision-makmg process used m underwriting of msurance plications. 
However, it is understood that the method and system may be broadly plicable to 
diverse decision-making applications in business, commercial, and manufacturing 
processes. Specifically, a structured methodology based on a multi-model parallel 
network of MARS models may be used to identify tiie relevant set of variables and 
their parameters, and build a frameworic capable of providing automated decisions. 
The parameters of the MARS-based decision system are estimated fiom a database 
consistmg of a set of applications with reference decisions against each s^lication. 
Cross-validation and development^old-out may be used m combmation witii re- 
sampling techniques to build a robust set of models tiiat minunize the error between 
the automated system's decision and tiie expert human underwriter. Furthermore, tiiis 
model builduag methodology may be used periodically to update and mamtain the 
family of models, if required, to assure tiiat the femily of models is current 
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Fig. 42 is a flowchart illustrating a process for building a MARS module according to 
an embodiment of the invention. At step 4205, one or more applications (also 
referred to as cases) are digitized. Digitization may include assuring that the key 
application fields required by the model to make a decision are captured in digital 
form by data entry. 

In step 4210, a case base is formed. Creating a case base may include assuring that 
the records corresponding to each application (e.g., case) are stored in a Case Base 
(CB) to be used for model construction, testing, and validation. Li step 4215, 
preprocessing of cases occurs. Prq)rocessing may include one or more sub-steps. By 
way of example, preprocessing may involve location translation and truncation 4216, 
such as focusing on values of interest for each field. Further, preprocessmg may 
involve range normalization 4217, such as normalizing values to allow for 
comparison along several fields. Preprocessing may also involve tag encoding 4218, 
wh^ tag encoding includes augmenting a record with an indicator, which embodies 
domain- knowledge in the record by evaluating coarse constraints into the record 
itself. 

In step 4220, partitioning and re-sampling occurs. According to an embodiment of 
the invention, five-fold partitioning may be used, with a stratified sampling within 
each rate class used to create five disjoint partitions in the CB. In step 4225, 
generation of a development and validation set occurs. Each partition may be used 
once as a'^vahdaticS^etr witii the rem four used as training sets. This may occur 
five times to achieve reliable statistics on the model performance and robustness. 

At step 4230, one or more model building e3q)aiments occur. E^cperiments with 
modeling may involve modeling techniques such as global regression and 
classification and regression trees C'CART^ determine rate classes fiom a case 
description. This may result with the selection of MARS as the modeling paradigm. 

At step 4235, a parallel network of MARS models is implemented. According to an 
embodiment of the invention, implementation of networks of MARS models may be 
used to improve classification accuracy. 
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According to an embodiment of tiiie invention, Ac MARS model(s) described may be 
used as an input to a fusion module. Fusion of multiple classifiers based on MARS, 
Case-based Reasoning, Neural Networks, efc, may be used to improve classification 
reliability, as described above. The steps of the process illustrated in Fig. 42 \vill now 
be described in greater detail. 

At step 4205, cases are digitized and at step 4210, a case base is formed. According 
to an embodiment of the invention, a MARS model framework starts from a database 
of applications with the corresponding response variable (e.g., rate class decisions) 
provided for each. This may be done via cooperative case evaluation sessions with 
experienced underwriters, or may be accomplished via the reuse of previously 
certified cases. This database of applications is hereby referred to as a ''Certified 
Case Base" or a "Case Base" According to an embodiment of the invention, it is 
assumed that the characteristics of the certified case base closely match those of 
incoming insurance applications received in a reasonable time window they form 
a "representative sample." The Case Base may form the basis of all MARS model 
development 

At step 4215, pre-processing occurs. According to an embodiment of the invention, 
one of the first steps in the model development process is to study the data and its 
various characteristics. This process may ensure that adequate attention is given to 
the understanding of the problem space. Later, appropriate pre-processing steps may 
be taken to extract the maximum information out of the available data via a choice of 
a set of explanatory variables that have the maximum discriminatory power. 
According to an embodiment of the invention, as illustrated in Fig. 43, one of the 
early findings was the fact that for most of the candidate variables that were chosen 
on the basis of e3q)erience and judgment of the human underwriting e3q)erts the 
decision boimdaiy regions as indicated by the human e}q)erts start at the tail-end of 
the variable distribution. 

As described above, the decision problem may be to classify each applicant into risk 
classes, which are typically increasing in risk. Thus, as an example, the attribute 
denoted by the level of cholesterol in the blood of an individual may be considered. It 
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is a known fact that a cholesterol level below 220 can be treated as almost nonnal. 
This suggests that in cases where the cholesterol level is at a certain level, such as up 
to about 240 at demarcation 4302, the human expert does not perceive a significant 
risk due to this factor. Thus, all cases with a cholesterol reading below this threshold 
can be grouped into a single dass^ e.g., "'Class 1," 4304 and the members in this class 
would not consequently impact tiie response variable the rate class decision). 
As shown, a cholesterol level value of 240 is close to die 75* quantile 4306 of the 
distribution, vAnlc the value of 270 is in the 90* quantile range 4308. 

One of the sub-steps may include location transformation and truncation 4216. A 
location transformation may be considered for all variables that exhibit the above 
property. Each variable may be transformed by subtracting out its normal value. This 
is realized by combining the knowledge of human experts as well, since for the 
majority of the attributes that are health related, tiiere are well-documented and 
published nonnal thresholds. 

According to an embodiment of the invention, it may not be desirable to differentiate 
among points within the normal ranges. Further, to focus the classifier on those in the 
abnormal range, the values of the variable may be saturated after a location 
transformatioiL In this case, the positive values may be considered, e.g. : 

NewValue = Afar(0, OldValue- ReferenceValue) 

The above is not a limitation of the general pre-processing step as would be applicable 
in other problems, but is a step relevant to the problem domain. There were variables 
which had the decision boundaries distributed fairly evenly over the entire range and 
did not warrant this specific transformatioiL 

Further, another sub-step may include range normalization 4217. If it is desirable to 
compute distances in a multi-dimensional space, e.g., to find the closest points to a 
given one, it may be necessary to normalize each dimension. Range normalization is 
typically the most common way to achieve this, e.g. : 
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NewValue - min,(NewValuei) 
NewValue%=- 



max,(NewValuei) - min, (NewValu©) 



Another sub-step may mvolve "tag'-encoding 4218. According to an embodiment of 
the invention, a specialized set of variable encoding may also be used to extract the 
maxunum information out of the decision space. This encoding may be referred to as 
tiie 'tag." The tag is essentially an ordinal categorical variable developed from a 
collection of indicators for the various decision boundaries as defined by human 
experts. These mdicators arc evaluated for each relevant variable in the collection. 
The maximum of the individual indicators over the collection of variables results m 
the final "tag." For example; assume that there are four key variables (out of a larger 
number of fields in the case) that are highUghted by actuarial studies to detennine 
mortality risk. Smce the same studies indicate the critical thresholds that impact sadii 
risk, there is no reason to re-leam those thresholds. Therefore, they may be encoded 
in the indicator "tag." Table 5 below illustrates four variables: Nicotine Histoiy 
(NH), Body Mass Index (BMI), Cholesterol Ratio (Choi. Rat), and Cholesterol Level 
(Oiol. Lev.), and four groups of rules, one for each variable. According to this 
example, the value of the tag starts with a default of 1 and is modified by each 
{^licable rule set A running maximum of the tag value is returned at the end, as the 
final result of tag. 
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Thus, a tag may provide a utUization of the available human expert knowledge to 
obtain a boost m accuracy. By way of example, the models were bmlt with and 
withSrot t^^^ of the specialized •^'Wariable and inclusion of the 

tag results in an improvement in accuracy by about 1 -2% on average. 

At step 4220, five-fold partitioning and resampling occurs, while a development and 
validation set is generated at step 4225. According to an embodiment of the 
invention, a stratified sampling methodology may be used to partition the data set into 
five equal parts. The stratification was done along the various rate classes to ensure a 
consistent representation in each partitioned sample. Further, a simple re-sampling 
technique may be used based on reusing each partition by taking out one part (done 
five times without replacement) as a holdout and recombining the remaining four and 
xising it as a development sample to build a complete set of MARS models. This may 
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be done five times, as mentioned earlier. By way of example, such a resampling and 
recombination was performed and the results were compared for consistency in 
accuracy, and also to note any fundamental shift in models. The accuracy measures 
were found to be closely grouped in the 94.5%-95.5% neighboAood and with model 
consistency throughout 

At step 4230, model-building experiments are performed. According to an 
embodiment of the invention, a variety of exploratory regression models may be built 
and trained on the CB development sets. Furfher, their classification accuracy may be 
tested and validated on tiie CB vahdation sets. According to an embodiment of the 
invention, a parallel-network of MARS models may evolve and develop firom a global 
regression model and a classification and regression trees CCART') model, and 
allows the use of MARS in the fi:amework of a multi-class classification problem. 
The global regression model and the classification and regression trees ("CARTO 
model will now be described in greater detail below. 

Since this is a multi-class classification problem, by definition the response variable is 
"apblychotomous categorical variable, i.e., a variable that can tate values from a set of 
labels (e.g., "Preferred Best," "Preferred," "Select," "Standard Plus," "Standard"). 
However, since in tiiis case the response is ordinal (the order of tiie categorical values 
reflects the correspondmg increasing risk), a risk metric may be obtained such as &om 
an actuarial department of the insurance company. This allows the taappmg of the 
categoricai^ues to numiericaTvalues (eg., reflecting^inortoli^ risk) and treating the 
response variable as a continuous one in order to fit a global multivariate linear 
regression. Using this method, a moderate fit to the data is obtained. However, the 
maximum accuracy achieved was about 60%, far firom the desired accuracy level of 
above 90%. 

Additionally, a CART based model may be built using the data. To maintain 
robustness and to avoid the possibility of overfitting the model, it may be necessary to 
minimize the structural complexity of the CART model. This approach yielded a 
CART tree with about 30 terminal nodes. Its corresponding accuracy level was 
substantially better than the global regression and was about 85%. Increasmg the 
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accuracy for tibe Iraiiimg sets would have resulted in deeper, more comply trees, with 
larger number of terminal nodes. Such trees would exhibit overfitting tendencies and 
poor generalization capabilities, leading to low accuracy and robustness when 
evaluated on the validation sets;. 

From these experiments, it can be determined that a global regression model, which is 
essentially a main-effects fit, has moderate explanatory power, but a CART tree, 
which is a local non-parametric model, has a much better p^ormance, Since CART 
is essentially a pure interaction-based model the motivation for a MARS based 
modeling schema was obvious, as MARS allows botii main and interaction effects to 
be incorporated into the model, and being a piecewise-linear adaptive regression 
procedure, MARS can approximate very well any non-linear structure (if present). 
Since the original motivation of development of the MARS algorithm stemmed firom 
the problem of discontinuity of CART terminal node estimates, the same boiefits may 
apply here. 

At stqp 4235, a parallel-netwoik of MARS models is implemented. According to an 
embodiment of the invention, one issue mvolved tiie difficulty of global models to 
incorporate the jumps in decision boundaries of majority of the variables in an 
extremely small bounded range. In other words, since the decision boundaries begin 
only after the 75* quaQtile value of the explanatory variable, the shift over all other 
decision variables usually occur by the 95* quantile. This issue may be addressed in a 
number of ways. According to one approach, '*tag" encoding as e?q>lained above 
helps the MARS search algorithm to find the "knots" in the right place. 

According to another approach, a "parallel networlf' arrangement of models may be 
used. A parallel network arrangement is a collection of MARS models, each of which 
solves a binary, or two-class problem. This may take advantage of the feet that the 
response variable is ordinal e.g., the decision classes being risk categories are 
increasing in risk. The approaches to these issues should not be considered as 
limitations of the methodology presented here, but rather a property explored in order 
to achieve better results. In addition, the above case graeralizes to handle problems 
where the response may not be ordinal. 
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An advantage of the order of the response variable may be taken by buildmg two 
models each for every rate class, except the boundary classes, with one model for 
each side. For easier reference, the two models may be referred to as the left model 
and the right model. Fig. 44 illustrates an example of such models. A population 
4402 is divided into non-smoking applications 4404, non-underwritten applications 
4406 and nicotine applications 4408. The "Preferred" class has been broken down 
into a "Preferred Left" model 4410 and "Preferred Right" model 4412. The minimum 
of the two models is selected, e.g., M(Pref) = min (L,R), 2814. The results are then 
input into the aggregation module 4416, which aggregates all results from the binary 
classifiers and selects the rate class that best fits a given application. For example, for 
the rate class "Preferred," two models are built vMch. estimate class membership 
value. The "LefT model distinguishes all preferred cases fix)m cases of classes^ 
which are to the left of preferred vAaie the "Right" model does the opposite. The 
final class membership value may be the minimum of these two membership values 
obtained. Furtiber, m the general case where there is no known order amongst classes, 
the Lefi/Right models may coU^se into a single model providing with one estimated 
memfafflrshipjisalug^i 

According to an embodiment of the invention, the MARS m^odology may be 
adapted to handle logistic regression problems in ibe classical sense. Such an 
adaptation would need an adjustment of the lack-of-fit C'LOF") criteria to be changed 
from least squares to logistic. H owever, logisti c regression procedur e is in itself a 
likelihood maximi2ation problem that is typically solved by using an iteratively re- 
weighted least squares ("IRLS") algorithm or its counterparts. The viability of 
MARS may depend on tiie fast update criteria of the least squares LOF frmction, 
which an IRLS logistic estimation would generally prohibit 

Accordmg to an ^bodiment of the invention, an approxunation may be made to use 

the final set of MARS variables back into a SAS logistic routine and refit. As said 

before, this is an approximation because if one could ideally use logistic LOF 

fimction, then one could have derived the optimal set of logistic candidate variable 

transforms. However, a re-fit process may still achieve the same degree of fit and 

provide model parsimony in some of the subset models built. Also, since the logistic 
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function is a (0,1) map, this gives class membership values that can be treated as 
probabilities. 

According to an embodiment of the invention, a MARS module may be implemented 
with software code in SAS and using MARS, where the code has been trained and 
tested using the five-fold partitions method described above. By way of example of 
the results of such an implementation. Fig. 45 illustrates a comparison matrix 4500 
(with a dimensionality of ifcx k), whose ifc columns contain the set of possible decisions 
available to the classifier, and whose k rows contain the correct corresponding 
standard reference decision, can describe a classifier's performance on a given data 
set, is illustrated in Fig. 45. 

In this example, agreement between the classifier and the standard reference decision 
occurs when the case results on the main diagonal of matrix 4500 while any other cell 
above or below the main diagonal contains misclassified cases. In the illustrative 
example depicted in Fig. 45, for the second row of 4502, labeled "Preferred," 360 out 
of total of 374 cases were correctly assigned to that rate class, wMe 1 was assi^ed to 
-"P-Best,^l-1 to "Selecf 1 to "Standard" and 1 to "Send to Underwriter." 

As shown in Fig. 46, 4602 refers to the total numba of agreements betwe^ the 
classifier and the standard reference decisions for non-smokers, vAale 4608 refers to 
the total number of agreements between the classifier and the standard reference 
-de(asions-for -smokers.---The -notations 4604 and 4606 refer to the total number of 
dis£^«ements between the classifier and the standard reference decisions for non- 
smokers, while 4610 and 4612 refer to the total number of disagreements between the 
classifier and the standard reference decisions for smokers. 4614 refers to the total 
number of agreements not to make a decision and send the case to UW (e.g., 
underwriter) and notations 4616 and 4618 refer to the total numb» of disagreements 
not to make a decision and send to UW. 

Further, the matrix depicted in Fig. 46 may be used to illustrate the performance 
measures used in the evaluation of the classifiers. Let i\rbe the total number of cases 
considered (in this example, 2,920). According to the annotation in Fig. 46, N = 
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ml + ml + m3 + m4 + m5 + m6 + m7 + m8 + m9. In tbis example, ^2 = 182, which 
is ib& sam of all cases that should have been sent to the human underwrite (i.e., m9+ 
ml in Fig. 46), and therefore NJ = (2,920-182) = 2,738. Three measures of 
performance for the classifier may be used, where MOJ) is a cell in the matrix shown 
inFig.45: 

• Coverage: thetotalnumberof decisions made by the classifier as a percentage 
of the total number of cases considered, i.e.: 



Using the annotations defined in Fig. 46, coverage may be redefined as: 

(M + m24-m3)+(yii4+/yi5+m6)+m9 
^^^^'*^^^"(wl+m2+m3)+(m4+m5+m6)+(iii7+iii8+w9) 



Thus, in the example depicted m Fig. 45 the coverage is: (2,920-242)72,920 = 91.71%. 
An addition perfoimance measure may include: 

• Relative Accuracy: the total number of correct decisions made by the 
classifier as a percentage of the total number of decisions made, ie, : 



Jt-l k k-1 



RelativeAccuracy = 2M(z,0/22^(''-7) 

/=i y=i 



Usmg the annotations defined in Fig. 46, the relative accuracy may be redefined as: 

ot1 + w4 



RelativeAccuracy = 



(wl + ot2 + m3) + (ot4 + w5 + w6) + m9 
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In the example depicted in Fig. 45 the relative accuracy is: (2,558)7(2,^0 -242)= 
95.52%. An further performance measure may include: 

• Global Accuracy: the total number of correct decisions made by the classifier 
as a percentage of ibs total numbo" of cases con^dered, ie.: 



k k k 

GlobalAccuracy = ^M(U)/YZM(i>j) 



Again, using the annotations defined in Fig. 46, the global accuracy may be redefined 
as: 

ml+m4+m7 

GlobalAccuracy = ■ 



(ml +nil+mS)+imA+m5 + iii6)+(m7 + m8+ im9) 



In the example depicted in Fig. 45 the global accuracy is: 2,734/2,920= 93.63%. 
Coverage and relative accuracy may be competing objectives. By establishing a 
confidence metric for the classifier output, one could adjust a confidence threshold to 
achieve various tradeoffs between accuracy and coverage. At one extreme, one could 
have a very low tradeoff, accepting any ou^ut (this would yield 100% covers^e but 
-very-low accuracy)— At -the-other- extreme, one could have very high confidence 
tfare^ol^. This would drastically reduce coverage but increase relative accuracy. 

The results of networics of MARS (or Neural Networks, as described below) models 
could also be post-processed to establish an alternative confidoice metric HisA could 
be used to achieve other tradeoff between accuracy and coverage. The tables set 
forth in Fig. 47 describe the performance of the network of MARS models on each of 
the five partitions. For each partition, the global and relative accuracy is listed, with 
the corresponding coverage. The results are shown with and without post-processing. 

Each of these partitions (e.g.. Partition 1, 4710, Partition 2, 4720, Partition 3, 4730, 

Partition 4, 4740 and Partition 5, 4750) shows the performance results of the network 
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of MARS models applied to 80% of the data used to bidld the model (training set 
4760) and 20% of the data that was withheld from the model construction (validation 
set 4770). The tables in Fig. 48 simmiarize the minimum 4810, maximum 4820, and 
average 4830 results of applying the network of MARS models to the five partitions. 

These tables illustrate that the average performance of a network of MARS models, 
applied to tiie five partitions, was very accurate. In particular a relative accuracy of 
95% on the validation set 4840 of Fig. 48, with coverage of about 90% may be 
extremely good and useful for quality assurance. An analysis of tiie minimum and 
maximum achieved may also show a high level of robustness, exemplified by the 
relatively tight range of performance values. 

The technical considerations that go into a MARS model are well known and can be 
found in Friedman's original paper in the Annals of Statistics^ the contents of which 
are incorporated herem by reference. However, to better illustrate the present 
invention, it is usefiil to describe a few basic points adopted in the MARS tuning as 
well as some additional steps that may be necessary to ensure a robust model building 
process. 

General MARS parameters may include overfit and cost-complexity pruning, cross- 
validation, and multi-coUinearity. According to an embodiment of the invention, 
MARS is essentially a recursive-partitioning procedure. The partitioning is done at 
points of the various explanatory variables defined as "knots" and overall 
optimization is achieved by perforaung knot optimization over the lack-of-fit criteria. 
Moreover, to achieve continuity across partitions MARS employs a two-sided power 
basis function of the form: 

&*(a;-0 = War 

However, in this case, a linear-piecewise basis q=l is used. Here *t' is the knot 
around which the basis is formed. It may be important to use an optimal number of 
basis functions to guard against possible overfit By way of example, an experiment 

65 



may be perfoimed wifh one dataset by starting from a small nmnber of maximal bs^is 
functions and building it up to a medium size number and use the cost-complexity 
notion developed in CART methodology and deployed in MARS to prune back and 
find a balance in terms of optimality which provides an adequate fit In this example, 
the use of cost-compl^ty pruning revealed that 25-30 basis fimctions were suffident 

Another important criteria which affects the pruning is the estimated degrees of 
freedom allowed, this may be done by using ten-fold cross validatioii from the data 
set for each model. 

In addition, there is no explicit way by which MARS can handle multi-collinearity. 
However, MARS does provide a parameter that penalizes the separate choice of 
correlated variables in a downstream partitioiL MARS then works with tiie original 
parent instead of choosing other alternates. According to an embodiment of the 
invention, a medium penalty noiay be used to take care of this problem. 

Further, optimization of cut-offs using evolutionary algorithms ("EA") may be used. 
When a new case comes in, it is evaluated by the complete set of models and a class 
membership distribution is obtained for every incoming case. Next in line comes the 
problem of assigning rate-classes to the incoming case. One alternative may be to use 
hand-tuned cut-ofi& computed trough simple tools like Nficrosoft Excel based solver. 
These results may be compared to an EA based optimized cut-off set By way of 
example, an evolutionar y al gorithm may provide a boost inaccuracy by about 1% as 
-compared to the hand-tuned cut-oflfe. - - — ~ _ . _ ... 

5. Neural Network Classifier 

Another aspect of the present invention may provide a mediod and system to 
implement a neural network classifier with multiple classes for automated insurance 
undervwiting and its quality assurance. Neural networks may be advantageous, as 
they can appipximate any complex nonlinear function with arbitrary accuracy (e.g., 
they are univeisal functional approximators). Neural networks are generally non- 
parametric and data-driverL That is, th^ {^proximate the underlying nonlinear 
relationship through learning from examples with few a priori assumptions about the 
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model. In addition, neural networks are able to provide estimates of posterior 
probabilities. Such posterior probability values may be useful for obtaining the 
highest possible decision accuracy in the classifier fusion or other decision-making 
processes. 

There are a variety of types of neural networks. However, neural networks can be 
broadly categorized into two main classes, /.e,, feed-forward and recurrent (also called 
feed back) neural networks. Among all Aese types, multiple-layer feed-forward 
neural networks are often used for classification. Neural networks can he directty 
applied to solve both dichotomous and polychotomous classification problems. 
However, it is generally more accurate and efficient when neural networks are used 
for two-class dichotomous) classification problems. As the number of classes 
mbreases, direct use of multi-class neural networks may encounter difficulties in 
training and in achieving the desired performance. 

As previously described, insurance underwriting problems may often involve the use 
of large numbers of features in the decision-making process. The features ^ically 
include the physical conditions, medical information, and family history of the 
applicant. Further, insurance underwriting frequentiy has a large number of risk 
categories rate classes). The risk category of an application is traditionally 
determined by using a number of rules/standards, which often have the form of "if the 
value of feature x exceeds a, then the application can't be rate class C, /.e., has to be 
lower than CT. These types of decision rules, 4930 and 4940 in Fig 49, "clip" tiie 
decision surface. Decision rules interpreted and used by a human underwriter may 
form an overall piecewise-continuous decision boundary, as shown in the graph of 
Fig. 49. 

To design a neural network classifier to achieve a comparable performance (e.g., 
accuracy and coverage) as rule-based classifiers for insurance underwriting, various 
issues may need to be addressed. First, a neural network may need to deal with a 
large number of features and target classes. . The large number of features and high 
number of target classes call for a high degree of complexity of neural network 
("NISPO structure (e.g., more nodes and more parameters to learn, i.e. higher Degrees 
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of Freedom (DOF). Such complex NN structures may require more training data for 
properly training the network and achieving reasonable generality (performance). 
However, sufGcient data may be difficult to obtain. Even witili sufficient data, the 
complex neural network structure requires ^ormous training time and computational 
resources. More importantly, complex NN structures (high DOF) tend to have more 
local minima, and thus, training is prone to fell into local minima and feils to achieve 
global minimization. As a result, it usually difficult to achieve a desired performance 
for a neural network with complex structure. 

Another issue to be addressed involves incorporating domain knowledge into the 
neural network classification process. As discussed before, the discrete rules that 
human underwriters use for risk category assignment form an overall piecewise- 
continuous decision boundary in tiie feature space and neural networks may have 
difficulty learning the dedsion boundary due to the insufficient data pomts bring 
available. One way to alleviate the difficulty and improve the performance of the 
neural network may be to directiy incorporate the rules into the neural network model 
and use these rules as additional information to "guide" network learning. 

One aspect of tiie present invention is related to a method and system of improving 
the performance of neural network classifiers, so that the neural network classifier can 
perform automated insurance underwriting and its quality assurance with a level of 
accuracy and reliability tiiat is comparable to tiie rule-based production decision 
engine. Specifically, this invention^impibves the performance of classifiers by 
decomposing a multi-class classification problem into a series of binary classification 
problems. Each of the binary classifiers may classify one individual class from tiie 
other classes and tiie final class assignment for an unknown input will be decided 
based on the outputs of all of the individual binary classifiers. 

Additionally, as anotiier way to improve the classifier performance, this invention 
incorporates the domain knowledge of the human underwriter into a neural netwdrk 
design. The domain knowledge, represented by a number of rules, may be integrated 
into a classifier by using an auxiliary feature, the value of whidi is determined by tiie 
rules. Moreover, to further improve tiie classifier performance, this invention may 
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also analyze the outputs of the individual binary classifiers to identify fte difficult 
cases for which the classifier cannot make a solid deci^on. To reduce 
misclassification rate, these difficult cases may then be sent to a human underwriter 
for fiirdier analysis. 

In the conventional design of multi-class neural network classifiers a siugle neviral 
netwcffk contains multiple ou^ut nodes. According to an embodiment of the 
invention, decomposing the multi-class classifier into multiple binary classifiers may 
solve a multi-class classification problem. For the purposes of illustration, assume 
that a hypothetical life insurance company has risk categories "Catl", "Cat2", "CatS", 
"Cat4", and "Cat5". A rating of "Catl" is the best risk, vsdiile "CatS" is the worst 
Then, the concept of the multi-class classifier decomposition used in this invention 
can be illustrated in the example of Fig. 50. Each binaiy classifier (5010, 5020, 5030) 
is for one class and is trained to classify the specific class (flie "class") and the rest of 
the classes combined (the "others'). Before training each of the binary classifiers, the 
training set is relabeled "1" for the data points in the "class" groiq) and "0" for the 
data points in the "others" group. When performing classification for a new input 
case, each^of the binary classifiers detCTmines Hxe probSjilify^ tlwt the^iiew case 
belongs to the class for which the binary classifier is responsible. Therefore, the 
output of the neural network is a number in the [0,1] interval. The final class fat the 
new input case is assigned by the MAX decision rule 5040. For example an 
application may receive a "0.6 a nd a 1" in the Cat3 and Cat4 categories, respec tively, 
and a "0" m the Catl, Cat2, and CatS risk categories. Hie MAX decision rule 5040 
may then select the Cat4 risk category. 

Accordmg to an embodiment of the invention, for each of the biiuuy classifiers 
designed in the current invention, the neural network is multiple-layer feed-forward in 
type and has one hidden layer. However, for other applications, usmg different neural 
network types wifli more than one hidden layer may be explored for obtaining better 
performance. It is tiierefoie to be understood fliat the current invention is not limited 
to one hidden layer feed-forward neural networics. Instea4 the method may work 
equally well for multiple numbers of hidden layers. 
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According to an embodiment of the invention, domain knowledge may be integrated 
into neural network learning by representing the knowledge with an auxiUary feature. 
The domain knowledge may be first represented by a series of rules. A typical rule 
has the following format (once again using the afore-mentioned five iQTpothetical rate 
classes): "If the applicant's cholesterol level exceeds 252, he does not qualify for rate 
class CI, ie., the best rate class for him is C2". Fonnally, this tule can be expressed 
in a general IF-THEN rule as follows. 

IF X, > t,j , THEN the best available rate class is 

vibiste X, is the feature, t,jisfhs f threshold of the feature, and is the f 
rate class. The mcoiporationofdomain knowledge is furfher described below. 

According to an embodiment of the invention, the classifier design process for a 
neural network classifier may comprise data prqprocessing, classifier design and 
optimization, and post-processing. These three aspects are described in greater detail 
below. 

Data preprocessing may include range normalization and feature extraction and 
selection. According to an embodunent of the mvention, range normalization is a 
process of mapping data fmm the original range to a new range. Normalization may 
be generally problem specific. However, it is often done either for convenience or for 
~^s§nteg-the uq)ut requirements of-the algorithm(s) under consideration. ..Eor pattern , 
classification problems, one purpose of normalization is to scale all features the 
classifier is using to a connnon range so that effects due to arbitrary feature 
representation (e.g., different units) can be elimmated. In addition, some classifiers, 
such as neural networks, require a range of input to be normalized. 

One way to normalize data is range normalization. To normalize the data by range, 
the feature value is divided by its range, the (hfference between the maximum and 
the minimum of the feature value. Let x,^be value of the/* data point of the 



/* feature. 
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Then the normalized value ^. is: 



" max(x^)-min(jc^) (1) 



The nonnalized values ;^/jWill be in the range of [0, 1]. The range noimalization 
requires knowing the Tniriinniin and the maximum values of the data. The greatest 
advantage of this normalization is that it introduces no distortion to the variable 
distribution, as the instance values and their corresponding normalized values have a 
linear relationship- That is, givm two instance values with the first being twice the 
gecond, when they are nonnalized the first normalized value will still be twice the 
second nonnalized value. This is why range nonnalization is also called linear 
scaling or linear transformation. 

Another type of data preprocessing may involve feature extraction/selection. For 
example, raw data is placed within a 20-column spreadsheet. The first colxmm is the 
applicant ID mraiber and the second colunm is the rate class. Columns 3 through 20 
are the atlributes/variables/features for the applicant Itastead of directly using tiie 18 
original features, two new features are derived. The first derived feature is the body 
mass index ("BMP'). Underwriter experience has shown that the BMI has more 
discriminating power in classificatioiL The second .derived feature, fag, is used to 
represent the domain knowledge in neural network training. The two derived features 
are fiirther described below. 

As described above, BMI is defined as ratio of weight in kilogram and the height 
squared in meters. Let wf be the weight in pounds and Ht be the height in inches. 
BMI can be expressed as: 

BM^ r (2) 
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One approach for incorporating domain knowledge into the neural network modeling 
involves training by hints, as described by Abu-Mostafe (1993), where almost any 
type of prior knowledge can be incorporated into a neural network through 
constructing the hints. Although the technique is flexible, it may be of a limited 
application in neural networks. According to an embodiment of the invention, 
domain knowledge is incorporated into flie neural network classifier by using an 
artificial feature, such as tag. The tag feature may take different values based on a set 
of rules that represent the domain knowledge. 

By way of example, the five family history features, such as ftom columns 3-7, are 
condensed and represented by two features, FHl and FH2. While the FHl feature has 
the binary values of 0 or 1, FIE has the triple values of 0,1, and 2. The values of FHl 
and FH2 are determined by the following rules, where the terms 
agejsibjcardj:ancjiiag, age_mothj:ardjumcjiiag, agejathj:ardj:cmcjiiag, 
agejnothj:ardjieath age Jiahj:ardjieath respectively correspond to the age when 
a sibling of the applicant was diagnosed with a cardiac or cancer disease, the age 
when the mother of the applicant was diagnosed with a cardiac or cancer disease, the 
age when the father of the appUcaht was diagnosed with a cardiac or cancer disease, 
the age when the mother of the ^plicant died due to a cardiac disease, and the age 
when the father of the applicant died due to a cardiac disease. For a given applicant, 
one or more of these terms may be not applicable. 
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IF (age_sib_card_canc_diag ^60) 

OR (agejnoth_card_canc_diag ^ 60) 

OR (age Jath_card_canc_diag ^ 60), 
THESFHjisl. 
Otbtrwise, FHi is 0. 

IF (age_moth_card_death <, 60) OR (agejath_cardjkath ^ 60), 

W (age_moth_card_death ^ 60) MID (agejatn_cardjteatn s ouj, 

TBESFH2=2. 

Otherwise, FH2=0. 



Examples of rules that may be used to compute TAG are listed below in Table 6. 
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Rate Class Name 
Rate Class Number 

A) Initialtze: 

B) Fire following ru les 



r 



Variatate# Ru1e#j 



1 

2 
2 
3 
3 
3 

4 
4 
4 
4 
4 

5 
5 
5 
5 
5 

6 
6 
6 
6 
6 



A Wl" 
B FH2 
C FH2 



9 
10 
11 
12 
13 

14 
15 
16 
17 
18 



NH 
NH 



3 NH 



BMI 
BMI 
BMI 
BMI 



8 BMI 



IF 

..... ^— 

a 1 
= 2 

< t3,2 

< t3,3 

< t3,4 

> t4.2 

> t4.3 

> t4,4 

> 14.5 

> t4.6 



Choi. Rat > t5.2 

Choi. Rat > t5,3 

Choi. Rat > t5,4 

Choi. Rat > t5,5 

Choi. Rat > t5,6 

Choi. Lev. > t6.2 

Choi. Lev. > 16.3 

Choi. Lev. > 16,4 

Choi. Lev. > 16,5 

Choi Lev. > (6.6 



PB P SelSt+StdUW 
1 2 3 4 5 6 
V= H 1 1 1 i 1 I 



v= v& 
Vs v& 
v= V & 
Vfe v& 
Vb v& 



THEN 
IT" 



v& 
Vfe v& 

v& 
v= v& 
v= v& 



v= v& 

Vi= V& 

V= V & 

v= v& 
v= v& 

V= V & 

v= v& 

V= V & 
V= V & 



Till 



'0~~6^ T 1 
b 0 0 



- 1 i i 1 



ti 0 0 



6 i 1 1 1 



d 0 0 



1 1 1 



0 0 d r 
"0 S S 0 0" 



0 1111 



0 0 1 1 r 

0 0 b 1 r 

0 0 0 r 

0 0 0 0 ti 



6 1111 



ti b r 

bob 



0 0 b U 
0 b b b 



Thre- 
shold Initial 
# 



t3,2 4.0 

tZ,Z 2.5 

t3,4 1.5 

t4^ 28.81 

t4,3 30.90 

14,4 3^60 

t4.5 35.05 

M,6 37.55 

tS^ 5.4 

t5.3 6.3 

«,4 7.3 

t5,5 8.3 

t5.6 10 

t6.2 252 

t6,3 275 

t6,4 288 

t6,5 303 

t6,6 400 



Table 6 

As indicated earlier, domain knowledge may be represented by a set of rules. A 
typical rule may have the following fonnat (once again using the afore-mentioned five 
hypothetical rate classes): "If the applicant's cholesterol level exceeds 252, he does 
not qualify for rate class CI, ie., the best rate class for him is CX\ For example, this 
rale can be expressed in a general IF-THEN rule as follows: 

IF X, > tij , THEN the best available rate class is C, 

Where, is the f feature, r,^is the f threshold of the feature, and Cj is the 
rate class. 
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A vector with binary nvimber "0" or "1" may be used to represent the consequent part 
of the IF-THEN rule. For example, [0, 1, 1, 1, 1] means the best rate class of C2 while 
[0, 0, 0, 1, 1] means the best rate class of C4. 

For each data point in the traming data set, all rules that "fire" are checked and the 
intersection the Boolean logic minimum) of the vector of the firing rule is 
calculated, as well as the vector that has mitial value of all ones. The value of the 
auxiliary feature may then be determined by counting the number of ones m Hbs final 
vector. As can be seen, the auxiUary feature takes integer ntnnbers ranging &om one 



FOR each ofthe data points in the training set 

Initialize vector V=[l, l,t. 1,1] 
FOR each of the rules 

IF the i*^ rule is fired, THEN V=V& Ti ("&" is logic AND) 
END of all rules 

The value ofthe aioaliary feature = the number of ones in the vector V. 
END of all data points 



to Ihe number of rate classes. The pseudo-code shown sunmaarizes flie procedure of 
determming the value of the auxiliary feature. 

After obtaining the value of the auxiliary feature for each data pomt, the auxiliary 
feature may be treated as a regular feature and included into the final feature set The 
neural network may then be trained and tested with the final feature set. Because of 
the additional mformation provided by the auxiliary feature, the neural network may 
be "guided" during leammg to more quickly find the piecewise contmuous decision 
boundary, which not only reduces the trammg time and efforts, but may also improve 
the classification performance of neural network classifier. 

Additional features that may be used for neural network clasafier design include, but 
are not limited to, tag, BMI, diastoUc and/or systoUc blood pressure readings, 
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cholesterol level, cholesterol ratio, various liver en2ymes, such as SGOT (Serum 
Glutamic Oxaloacetic Transaminase), SGPT (Serum Glutamic Pyruvic 
Transaminase), GGT (Galactan Galactosyl Transferase), nicotine use history, and 
various aspects of &mily history. 

Thae are a number of types of neural networks. According to an embodiment of the 
invention, a three-layer feed-forward neural network with back propagation learning 
m^ be used. Two separate models may be used for nicotine and non-iucotine cases, 
respectively. By way of eicample, for nicotine cases, there may be three rate classes, 
e.g., 'Treferredjoic," "Standardplus_nic," and Standard_nic, while non-nicotine cases 
may have five rate classes, eg., "Best," "Preferred," "Select," "Standardplus," and 
"Standard." Both models are multiple-class classifiers. A neural network with 
multiple output nodes may be a typical design for multiple-class classifiers \\iiere 
each of the neutral network output nodes corresponds to each class. However, neural 
networks vwth multiple output nodes may have a large number of weights and biases, 
and thus require a large tiaining data set and more trammg lime for properly liauung 
the network. If the data size is relatively small compared to the number of features 
and the number of classes, multiple binary neural networks may be used to perform 
the multiple-class classification. Using multiple binary-networks may reduce tiie 
complexity of the network, thus reducing the training time, but also may improve the 
classification performance. An example of tiie architecture of a neural network 
classifier is illustrated in Fig. 51. the non-nicotine model 5110 has five binary 
classifiers 5120 while the nicotine mpdd 5130 has three binary classifiers 5140. Each 
model 5110, 5130 has a MAX function 5150 and 5160. Applications in the non- 
nicotine model 5110 are then assigned to the appropriate rate class 5170, while 
applications in the nicotine model 5130 are assigned to tiie appropriate rate class 
5180. 

In the example of Fig. 51, each binary network has the structure of 12-5-1, e.g., 

twelve input nodes, five hidden neurons, and one output node. Activaticm fimctions 

for both hidden and output neurons may be logistic sigmoidal functions. Accordmg to 

an embodiment of the mvention, the range of target values may scaled to [0.1 0.9] to 

prevent saturation during training process. The Levenberg-Marquardt numerical 
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optimization technique be used as the badcpropagation-leaming algorithm to 
achieve second-order training speed. 

Each binary networic represents an individual rate class and is trained with the tai^ets 
of one-vs-other. During classification for an unknown case, each networic provides tiie 
probability of the unknown case belonging to the class it represents. The final rate 
class of the unknown case is determined by the MAX decision rule, e g., given a 
vector wdiose entry values are in the interval [0,1], the MAX rule will return the value 
of the position of the largest entiy. 

To further improve the classification paformance, it may be advantageous tp apply 
some post-processing techniques to the outputs of the individual networks, prior to the 
MAX decision making process. Instead of assdgning rate class to an unknown case 
just based on the m^^xitnum outputs of the individual networks, the distribution of the 
outputs is characterized. If the distribution of the outputs does not meet certain pre- 
defined criteria, no decision needs to be made by fbs classifier. Rather, the case will 
be sent to human underwrite for evaluation. The rationale here is that if a correct 
-decisioh-cannot be_made, it would be preferable that the classifier makes no decision 
rather than the wrong decision. Considering the neutral network outputs as discrete 
membership grade for all rate classes, the four features that characterize the 
mranbership grades may be the same as those set forth above with respect to the 
fusion module discussed above, ie., cardinality, entropy, the difference between the 
~Mgh^~and the~second high values of outputs, and the separation between rank orders 
of the highest and the second highest values of outputs. 

Again, with the features defined for characterizu^ the networic outputs, the followii^ 
two-step criteria may be used for "rejecting" the cases: 

Stepl: C<rj OR C>r2 OR JE>T3 
Step 2: I><r4 AND 5^1 

Where r, ,r2,r3 , and are the thresholds. The value of the tiiresholds is typically 

data set dependent. In itas raibodiment, the value of the thresholds are first 
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empirically estimated and then fine-tuned by evolutionary algorithms (EA). The final 
numbers for all five-fold data sets are illustrated in Table 7 below: 



Non-nicotine model 






Run#l 


Run#2 


Rui^3 


Run#4 


Rui^ 




0.5 


0.5 


0.5 


0.5 


0.5 




2.0 


2.0 


2.0 


20 


2.0 




0.9 


0.9 


0.9 




098 




0.1 


0.15 


0.1 


n 1 
U.i 


0 07 


Nicotine mod 


lei 








Run#l 


Run#2 


Rui#3 


Run#4 


Run#5 














h 


0.3 


0.3 


0.3 


0.3 


0.3 




1.75 


1.75 


1.75 


1.75 


1.75 




0.85 


0.85 


0.8 


0.85 


0.85 




0.2 


0.25 


0.2 


0.2 


0.2 



Table 7 

According to an embodiment of the invention, a neural network classifier may be 
unplemented using software code, and tested against a case base. By way of example, 
a software implementation of a neural netwodc may use a case base of 2,879 cases. 
After removal of 173 UW cases, the remaining 2,706 cases were used for training and 
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testing the neural network classifier. Five-fold cross-validation v^as used to estimate 
the performance of the classifio:. 

Tlie<x)mbined conftision matrices ofthe five-fold runs are iUuslrated in Fig. 5^^ For 
comparison, the combined confiision matrices for the five-fold runs after post- 
processing are illustrated in Fig. 53. The performance for this example before post- 
processing is provided in Fig. 54, while the perfomiance for this example after post- 
processing is provided in Fig. 55. 

According to an embodiment of the invention, the systems and processes described in 
this invention may be implemented on any general purpose computational device, 
eitber as a standalone application or appUcations, or even across several general 
purpose computational devices connected over a network and as a group operating in 
a cUent-server mode. According to another embodiment of the invention, a computer- 
usable and writeable medium having a pluraUty of computer readable program code 
stored therein may be provided for practicing the process of the present invention. 
Hie process and system of the present invention may be hnplemented within a variety 
of operating sys^ -such-as a Windows® operating sy^^^ of a 

Unix-based operating system a Hewlett Packard, a Red Hat, or a Linux version 
of a Unix-based operating system), or various versions of an ASMOO-based operating 
system. For example, the computer-usable and writeable medium may be comprised 
of a CD ROM, a floppy disk, a hard disk, or any other computer-usable medium. One 
or more ^e components of the system oi systems embSdyinglfie- ESMtinvartion 
may comprise computer readable program code m the fomi of fimctional instructions 
stored m the computer-usable medium such that when the computer-usable medhrai is 
instaUed on the system or systems, those components cause the system to perform the 
functions described. The computer readable program code for the present invention 
may also be bundled with other computer readable program software. Also, only 
some of the components may be provided m computer-readable code. 

AdditionaUy, various entities and combinations of entities may employ a computer to 
implement the components performing the above-described fimctions. According to 
an embodiment of the invention, the computer may be a standard conq)Uter 
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comprising an input device, an output device, a processor device, and a data storage 
device. According to other embodiments of the invention, various components may 
be computers in dififerent departments within the same corporation or entity. Other 
computer configurations may also be used. According to another embodiment of the 
invention, various components may be separate entities such as corporations or 
limited liabiUty companies. Other embodiments, in compUance wi^ applicable ]avis 
and regulations, may also be used. 

According to one specific embodiment of the present invention, the system may 
comprise components of a software system. Tlie system may operate on a network 
and may be connected to other systems sharing a common database. Other hardware 

aitangements may also be provided. 

Other embodunents. uses and advantages of the present invention wiU be apparent to 
those skilled in the art from consideration of the specification and practice of the 
invention disclosed herein. The specification aad examples should be considered 
exemplary only. The intended scope of the invention is only limited by the claims 
e^^)^ded hereto^ 

While the invention has been particularly shown and described witiiin the fiamework 
of an insurance underwriting appUcation, it wiU be appreciated that variations and 
modifications can be effected by a person of ordinary skill in tiie art without departing 
- -fiom tiie scope of the invention.- -For example, one of ordmary s^ m the art will 
recognize that certain classifiers can be appUed to any other transaction-oriented 
process in vMch underlying risk estimation is required to determine tiie price 
structure premium, price, commisaon, etc.) of an offered product, such as 
insurance, re-insurance, annuities, etc. Furthermore, one of ordmary skill m tiie art 
will recognize tiiat such decision engines do not need to be restricted to insurance 
underwriting applications. 
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CLAIMS 

What is claimed is: 

1 A process for preparing an insurance application for underwriting based on a 
pluraUty of previous insurance application underwriting decisions, the process 
compri^g: 

receiving a request (3560) to underwrite the insurance appUcation; 
assigning a risk classification to the insurance jqsplication (3800); 

defining a set (3804, 3808) comprising at least one of the plurality of previous 
insurance application underwriting decisions; 

comparing the insurance application to tiie set (3530); and 

desig natin g the insurance application based at least in part on the comparison betw em 
the insurance appUcation and tiie set and the risk classification assigned to tiie 
insurance application, where the designation is one of: 

designating tiie insurance application as an outUer insurance ^Ucation (3540); and 
designating the insurance application for underwritingi — 

2. The process according to claim 1, wherein tiie insurance application comprises 
at least one feature; and 

where comparing tiie insurance application to tiie set fiirther comprises: 

comparing at least one feature of tiie insurance application to a corresponding featiire 
in the at least one of tiie plurality of previous insurance application underwriting 
decisions in the set. 

3. The process according to claim 2, where: 
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comparing the at least one feature to a corresponding feature in the set further 
comprises determining if the at least one feature in the insurance application is 
dominated by the corresponding feature in the first set. 

4. The process accordmg to claim 1, wherein comparing the insurance 
application to the set further comprises: 

determining if the insurance apphcation is dominated (3540) by the at least one 
application in the first set 

5. The process according to claim 1, wherein comparing the insurance 
application to the set further comprises: 

comparing the classification assigmnent of the insurance application to the 
classification assigmnent of the at least one of the plurality of previous insurance 
application underwriting decisions in the set. 

6. The process according to claim U where the at least one insurance application 
-in-tiie set includes a classification assignment; and 

where designating the msurance application as an outlier insurance application occurs 
when a dominance relationship between the insurance application and one of the at 
least one insurance application in the set is inconsistent with the relationship of the 
classifi catio n assig mnents between the insurance application and the on e of the at 
least one insurance application in the set- — . _ _ 

7. A process for preparing an insurance application for underwriting based on a 
plurality of previous insurance application underwriting decisions, the process 
comprising: 

receiving a request (3560) to underwrite the insurance application, wherein tiie 
insurance application comprises at least one feature; 

assigning a risk classification to the insurance application (3800); 
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1. 



defining a set (3804, 3808) comprising at least one of the plurality of previous 
insurance jqjplication underwriting decisions; 

comparing the insurance j^jpUcadon to the set (3530), where comparing comprises: 

comparing at least one feature of the insurance appUcation to a corresponding feature 
in the at least one of the plurality of previous insurance appUcation underwritmg 
decisions in &e set; and 

comparing the classification assignment of the insurance ^pUcation to the 
classification assignment of the at least one of &e pluralily of previous msurance 
application underwriting decisions in the seU and 

designating the insurance appUcation hased at least in part on the comparison between 
the insurance appUcation and the set and the risk classification assigned to the 
insurance appUcation, where the designation is one of: 

designating the insurance appUcation as an outlier insurance application (3540); and 



designating the insurance ^Ucation for underwriting. 
8. The process according to claim 7, 'ttiiere: 

comparing the at least one feature to a correspondmg feature in the set further 
-c^aises-determinmg-ifHhe -at-least -one-feature in the insurance appUcation is 



dominated by the corresponding feature in the set 

9. The process accorxUng to claun 7, where designating the insurance appUcation 
as an outUer insurance appUcation occurs vihea a dominance relationship between the 
insurance application and one of the at least one insurance appUcation m the set is 
inconsistent with the relationship of the classification assignments between the 
insurance appUcation and the one of the at least one insurance appUcation in tiie set 

10. A computer readable medium having code for causmg a processor to prepare 
an msuiance appUcation for underwriting based on a plurality of previous insurance 
application underwriting decisions, the medium comprising: 
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code for receiving a request (3560) to underwrite the insurance appUcation; 

code for assigning a risk classification (3800) to the insurance q)pUcation; 

code for defining a set (3804, 3808) comprising at least one of the pluraHty of 
previous insurance application underwriting decisions; 

code for comparing the insurance application to the set (3530); and 

code for designating the insurance application based at least in part on the comparison 
between the insurance application and the set and Ae risk classification assigned to 
the insurance application, where the designation is one of: 

designating the insurance application as an outlier insurance plication (3540); and 
designating the msurance application for underwriting. 

11. The medium according to claim 10, \\^erein the insurance application 
contprises at least one feature; and 

where the code for comparing the insurance application to the set fiirther comprises: 

code for comparing at least one feature of the insurance appUcation to a 
corresponding feature in tiie at least one of tiie pluraUty of previous insurance 
application underwritmg decisions in tiie set 

12. The medium aiccording to claim 11, where: 

the code for comparing tiie at least one feature to a corresponding feature in tiie set 
fiirther comprises determining if tiie at least one feature in tiie insurance appUcation is 
dominated by the corresponding feature in the set. 

13. The medium according to claim 10, wherein the code for comparing the 
insurance appUcation to each of tiie set fiirther comprises: 

code for determining if tiie insurance appUcation is dominated by the at least one 
appUcation in tiie set. 

84 



WU 20U4/Uy5»V4J .^M.... V 

14. The medium according to claim 10, wherein the code for comparing the 
insurance sq>plication to each of the set further comprises: 

code for comparing the classification assignment of the insurance appUcation to the 
classification assignment of the at least one of the pluraUty of previous insurance 
application underwriting decisions in Ihe set 

15. The medium according to claim 10, where the at least one insurance 
plication in the set includes a classification assignment; and 

whCTC designating the insurance appUcation as an outlier insurance appUcation occurs 
when a dominance relationship between the insurance appUcation and one of the at 
least one insurance appUcation in the set is inconsistent with tiie relationship of the 
classification assignments between the insurance appUcation and tiie one of the at 
least one insurance appUcation in the set. 

16. A computer readable medium having code for cauang a pirocessor to prepare 
an msurance appUcation for underwriting based on a pluraUty of previous insurance 
appUcation underwriting decisions, the medium comprising: 

code for receiving a request (3860) to underwrite the insurance appUcation, wherein 
the insurance appUcation comprises at least one feature; 

rnHp. for assigning a risk classification to the insurance appUcation (3800); 



code for defining a set (3804, 3808) comprismg at least one of the pluraUty of 
previous insurance appUcation underwriting decisions; 

code for comparing the insurance appUcation to the set (3530), where tiie code for 
comparing comprises: 

code for comparing at least one feature of tiie insurance appUcation to a 
coiresponding feature in tiie at least one of tiie pluraUty of previous insurance 
appUcation underwriting decisions in the set; and 
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code for comparing the classification assignment of the insurance application to the 
classification assignment of the at least one of the plurality of previous insurance 
application underwriting decisions in the set; and 

code for designating the insurance application based at least in part on the comparison 
between the insurance application and the set and the risk classification assigned to 
the insurance application, where the designation is one of: 

designating the insuiance application as an outlier insurance application (3540); and 
designating the insurance application for underwriting. 

17. The mediirai according to claim 16, where: 

the code for comparing the at least one feature to a corresponding feature m the set 
furfhCT comprises code for detemiining if the at least one feature in the insurance 
application is dominated by the corresponding feature in the set. 

18. The medium according to claim 16, where designating the insurance 
application as an outlier insurance application occurs v^en a dominance relationship 
between the insurance application and one of the at least one insurance application in 
the set is inconsistent with the relationship of the classification assignments between 
the insurance application and the one of the at least one insurance ^plication in tiie 
.set.- 

19. A system to prepare an insurance application for underwriting based on a 
plurality of previous insurance application imderwriting decisions (3610), the system 
comprising: 

means for receiving a request (3560) to underwrite the insurance application; 

means for assigning a risk classification (3800) to the insurance application; 

means for defining a set (3804, 3808) comprising at least one of the plurality of 
previous insurance application underwriting decisions; 
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means for comparing the insurance application (3530) to the set; and 

means for designating the insurance application based at least in part on the 
comparison betwera the insurance application and the set and the risk classification 
assigned to the insurance application, where the designation is one of: 

designating the insurance application as an outlier insurance gqpplication (3540); and 

designating the insiirance application for iinderwriting. 

20. The system according to claim 19, wherein the insurance application 
comprises at least one feature; and 

where the means for comparing the insurance application to the set further comprises: 

means for comparing at least one feature of the insurance application to a 
corresponding feature in the at least one of the plurality of previous insurance 
application underwriting decisions in the set 

The system accordiS^toxlaim 207where: 

r 

the means for comparing the at least one feature to a corresponding feature in the set 
further comprises determining if the at least one feature in the insurance application is 
dominated by the corresponding feature in the set 

22. The system according to claim 19, wherein the means for comparing the 
insurance application to the set further comprises: 

means for determining if the insurance application is dominated by the at least one 
application in the set 

23. The system according to claim 19, wherem the means for comparing the 
insurance application to the set further comprises: 

means for comparing the classification assignment of the insurance application to the 
classification assignment of the at least one of the plurality of previous insurance 
application underwriting decisions in the set. 
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24. The syston according to claim 19, where the at least one msurance application 
in tiie set includes a classification assignment; and 

where designating the insurance qyplication as an outlier insurance application occurs 
when a dominance relationship between the insurance application and one of the at 
least one insurance application in the set is inconsistent with the relationship of the 
classification assignments between the insurance application and the one of the at 
least one insurance application in the set 

25. A system to prepare an insurance application for underwriting based on a 
plurality of previous insurance application underwriting decisions (3610), tiie system 
comprising: 

means for receiving a request (3560) to underwrite the insurance application, wherein 
the insurance application comprises at least one feature; 

means for assigning a risk classification (3800) to the insurance application; 

means for defining a set (3804, 3808) comprising at least one of the plurality of 
previous insurance application imderwriting decisions; 

means for comparing die msuiance appUcation (3530) to the set, where the code for 
comparing comprises: 

means for comparing at least one feature of die insurance application to a 
corresponding feature in the at least one of fhs pluratity of previous insurance 
application underwriting decisions in the set; and 

means for comparing die classification assignment of the insurance application to die 
clasafication assignment of the at least one of the plurality of previous insurance 
application underwriting decisions in the set; and 

means for designating the insurance application based at least in part on the 
comparison between the insurance application and the set and the risk classification 
assigned to the insurance application, v/bexe the designatiori is one of: 
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designating the insurance application as an outlier insurance application (3540); and 
designating the insurance application for underwriting. 

26. The system according to claim 25, wh«e: 

tiie means for comparing the at least one feature to a corresponding feature in the set 
further comprises code for detoinining if the at least one feature in the insurance 
application is dominated by the corresponding feature in tide set 

27. The system according to claim 25, where designating the insurance application 
as an outlier insurance application occurs when a dominance relationship between the 
insurance application and one of the at least one insurance application in the set is 
inconsistent with the relationship of the classification assignments between the 
insurance application and Has one of the at least one insurance application in the set 

28. A system to prepare an insurance ^>plication for underwriting based on a 
plurality of.preyious insurance application imderwriting decisions (3610), the system 

-comprising: '. 

a recdver for receiving a request (3560) to underwrite the insurance ^plication; 

a classifier module assigning a risk classification (3800) to the insurance application; 

a definition module for defining a set (3804, 3808) comprismg at least one of the 
plurality of previous insurance application imderwriting decisions; and 

a comparison module comparing the insurance application (3530) to tiie set; and 

a designation module for designating the insurance application based at least in part 
on the comparison between the insurance application and the set and tiie risk 
classification assigned to the insurance application, where the designation is one of: 

designating the insurance application as an outlier insurance application (3540); and 

designating the insurance application for underwriting. 
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29. The system accordiBg to claim 28, wherein the insurance application 
comprises at least one feature; and 

where the comparison module further compares at least one feature of the insurance 
application to a corresponding feature in the at least one of the plurality of previous 
insurance application underwriting decisions in the set 

30. The system according to claim 29, where: 

tiie comparison module further determines if the at least one feature in the insurance 
application is dominated by the corresponding feature in the set. 

31. The system according to claim 28, wherein the comparing module further 
determines if the insurance application is dominated by the at least one application in 
the set 

32. The system according to claim 28, wherein the comparison module furttier 
compares the classification assignment of the insurance application to the 
r.lflgRification assignment of the at l east one of flie plurality jgf previous insurance 
application underwriting decisions in the set. 

33. The system according to claim 28, where the at least one insurance application 
in the set includes a classification assignment; and 

where designating the insurance ^pUci£Son' as im o occurs 
when a dominance relationship between the insurance application and one of tiie at 
least one insurance application in the set is inconsistent with the relationship of the 
classification assignments between the insurance application and the one of the at 
least one insurance application in tiie set 

34. A system for preparing an insurance application for underwriting based on a 
plurality of previous insurance application underwriting decisions (3610), the system 
comprising: 
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a receiver for receiving a request (3560) to underwrite the insurance application, 
wherein Ihe insurance application conqprises at least one feature; 

a classifier module for assigning a risk classification (3800) to the insurance 
application; 

a definition module for defining a set (3804, 3808) comprising at least one of the 
plurality of previous insurance plication underwriting decisions; and 

a comparison module for: 

a) comparing the insurance appUcation (3530) to tiie to the set; 

b) comparing at least one feature of the insurance application to a 
corresponding feature in the at least one of the plurality of previous msurance 
application underwriting decisions in the set; 

c) comparing tibe classification assignment of the insurance application to the 
classification assignment of the at least one of the plurality of previous insurance 
plication underwriting decisions in the set; and 

a designation module for designating the insurance appUcation based at least in part 
on the comparison between the msurance application and the set and the risk 
classification assigned to the insurance application, where the designation is one of: 



designating the insurance application as an outlier insurance implication (3540); and 
designating the insurance application for underwriting. 

35. The system according to claim 34, where: 

the comparison module fijrther determines if tiie at least one feature in the inisurance 
tq)plication is dominated by the corresponding feature in the first set 

36. The system according to claim 34, where designating the insurance application 
as an outiier insurance application occurs when a dominance relationship between the 
insurance application and one of tiie at least one insurance application in the set is 
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inconsistent with the relationship of Ihe classification assignments between the 
imnrance application and the one of the at least one insurance application in the set 
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